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INTRODUCTION 


l.A  PROJECT  OBJECTIVE 

The  objective  of  this  project  is  to  develop  a  “systems-on-chip”  implementation  of  a  PicoNode, 
which  can  provide  all  the  communication,  computation,  and  geolocation  functions  necessary  for 
an  adaptive  distributed  sensor-and-monitor  network.  The  monolithic  integration  of  the 
communication  and  computation  components  will  allow  orders  of  magnitude  reduction  in  cost, 
size,  and  power  consumption  of  the  distributed  sensor  nodes.  The  final  node  will  occupy  less 
than  0.15  inch3,  and  will  consume  less  than  1  mW.  The  node  will  feature  the  necessary 
flexibility  to  support  a  highly  adaptive  and  programmable  wireless  link,  and  the  dynamic  trading 
off  between  communication  and  computation,  as  necessitated  by  the  varying  costs  of 
communication. 


l.B  APPROACH 

The  use  of  state-of-the-art  CMOS  technology  and  the  most  advanced  system-on-a-chip  design 
methodology  enables  the  integration  of  all  the  communications  and  computation  functions 
required  between  the  antenna  and  the  sensor  for  a  distributed  sensor  network  in  a  single  chip, 
called  a  PicoNode.  This  includes  the  analog  RF  communication  and  sensor  interface  circuitry, 
localization,  as  well  as  digital  computation  implemented  as  a  balanced  mixture  of  programmable, 
reconfigurable  and  dedicated  components.  A  3-phase  progression  of  prototype  implementations 
will  lead  to  the  final  single-chip  PicoNode,  each  time  reducing  the  size  and  power  dissipation 
with  approximately  a  factor  10.  PicoNode  I  will  be  made  out  commercial  off-the-shelf 
components  (Year  1),  PicoNode  II  is  a  multi-chip  implementation,  integrating  the  most  energy 
consuming  portions  of  the  design  (Year  2),  while  PicoNode  III  represents  the  fully  integrated 
sensor  and  monitor  node  (Year  3). 

A  system  design  approach,  which  jointly  optimizes  the  algorithmic  research,  the  node 
architecture  and  hardware,  and  the  software  environment,  will  be  used.  This  process  exploits  the 
close  industry  interactions  of  the  Berkeley  Wireless  Research  Center,  which  provide  access  to 
state-of-the-art  design  tools,  methodologies,  and  fabrication  technologies. 


l.C  RECENT  ACCOMPLISHMENTS 

•  60  units  of  PicoNode  I  operational  and  in  active  use.  Average  power  dissipation  of  460  mW 

3 

for  a  total  node-size  of  1 8  inch  . 

•  Multi-hop  ad-hoc  network  (media  access  +  network  +  application  layer)  running  on 
PicoNode  I  test-bed.  Lifetime  of  node:  26  hours 
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•  Chip-set  for  PicoNode  II  completely  functional.  Both  the  protocol  and  baseband  processor 
have  been  fabricated  and  tested.  A  test-board  combining  the  two  chips  with  an  off-the-shelf 
RF  front-end  has  been  constructed  and  is  operational,  delivering  a  complete  wireless 
transceiver  solution.  The  peak  power  dissipation  of  the  two  digital  chips  is  approximately  15 
mW,  which  is  below  the  estimated  value  of  20  mW.  For  the  digital  processing,  this 
represents  a  reduction  with  a  factor  of  23  over  PicoNode  I. 

•  PicoNode  III:  a  <lmW,  0.6  inch  integrated  wireless  transceiver  for  wireless  sensor 
network,  powered  by  energy  scavenging  to  be  fully  integrated  and  operational  by  the  end  of 
the  project 

?  The  system-design  of  the  node  (component  selection,  partitioning,  simulation)  has 
been  completed. 

?  An  innovative  low-energy  front-end  based  on  FBAR  micro-resonators  has  been 

developed.  Two  test-chips  have  been  designed,  two  of  which  have  been  tested.  The 
operation  of  a  complete  radio  chain  has  been  demonstrated  using  a  chip-on-board 
implementation.  A  fully  integrated  version  is  currently  in  fab  and  is  expected  back 
by  late  February. 

?  Behavioral  specification  of  digital  network  processor  (which  combines  the  physical 
layer,  data-link  and  multi-hop  network  protocols,  localization,  and  application 
functions)  is  operational  A  full  version  of  the  processor  has  been  emulated  on  a 
Vertex-II  FPGA. 

?  The  digital  network  processor  introduces  the  concept  of  power-domains.  Unused 
modules  are  powered  down  either  completely  or  to  the  retention  voltage  to  reduce 
leakage  power.  A  power-down  SRAM  test  chip  has  been  designed  and  tested 
demonstrating  the  validity  of  the  concept. 

?  Energy-scavenging  power  train,  based  on  solar  power  has  been  tested  and 
characterized.  A  prototype  package  has  been  designed. 

This  project  fully  met  its  original  goals.  Over  the  3  generations,  the  power  dissipation  of  the 
wireless  transceiver  node  has  been  reduced  by  a  factor  of  460,  while  the  volume  of  the  node  was 
reduced  by  a  factor  30. 

•  RF  front-end  prototype  chips : 

?  FBAR  based  oscillator  has  been  fabricated  and  tested  (300  mW). 

?  Test  chips  containing  all  components  of  the  RF  transceiver  have  been  fabricated  and 
are  being  tested. 

•  Digital  network  processor: 

?  Major  components  of  the  chip  (memory  controller,  memory,  MAC)  have  been 
evaluated  and  characterized. 

?  Behavioral  spec  has  been  operational 
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? 


FPGA  version  operational. 


•  Memory  test  chip  has  been  fabricated  and  is  being  tested. 

•  Energy  scavenging  power  train  has  been  tested  and  characterized 


l.D  TECHNOLOGY  TRANSFER 

The  research  in  this  project  is  performed  at  the  Berkeley  Wireless  Research  Center  (BWRC), 
which  is  a  University  affiliated  research  consortium  with  10  companies  (Agilent,  Atmel, 

Cadence,  Ericsson,  Hewlett  Packard,  Hitachi,  Infineon,  Intel,  Qualcomm,  and  SGS-Thompson). 
A  high  priority  in  the  design  of  the  Center  has  been  made  to  facilitate  collaboration  between 
researchers  from  the  member  companies  and  the  Center  faculty,  staff  and  students  thus  providing 
the  best  possible  situation  for  technology  transfer.  A  number  of  the  member  companies  are 
directly  involved  in  the  PicoRadio  research  and  its  results  (Ericsson,  Intel,  Cadence,  SGS- 
Thompson,  Hewlett  Packard,  and  Agilent). 

Furthermore,  this  project  is  at  the  core  of  some  very  ambitious  projects,  applying  low-energy 
wireless  transceiver  technology  made  available  through  this  program.  The  most  important  one  is 
the  $350  M  University  of  California  CITRIS  Institute,  which  focuses  on  the  development  of 
societal  scale  information  systems,  addressing  large  problems  that  hamper  society  at  large  such 
as  traffic  management,  energy  consumption  and  disaster  mitigation.  PicoRadio  sensor  networks 
form  the  backbone  of  these  societal-systems.  An  application  that  is  already  being  prototyped  is 
the  Smart  Home.  The  combination  of  integrated  sensors,  actuators,  and  controllers  help  to 
increase  quality-of-living  and  the  energy-efficiency  of  large  office  buildings.  These  projects  are 
cooperative  efforts  between  BWRC,  the  Berkeley  Sensor  and  Actuator  Center  (BSAC),  and 
Center  for  the  Built-environment  (CBE)  and  their  many  industrial  partners. 

Finally,  the  PicoRadio  project  has  received  major  attention.  The  paper  “PicoRadios  for  Wireless 
Sensor  Networks:  The  Next  Challenge  in  Ultra-Low  Power  Design,”  presented  at  the  2002 
ISSCC  conference  has  been  awarded  the  ISSCC  2002  Jack  Raper  Outstanding  Technology 
Directions  Paper  Award  ISCC  is  the  premier  conference  in  the  area  of  semiconductor 
integrated  circuits.  PicoRadio  was  featured  in  the  Wireless  Review  Magazine  as  one  of  the 
exciting  emerging  technologies.  Finally,  PicoRadio  technologies  have  been  or  will  be  featured  in 
a  number  of  keynote  presentations  in  2002  and  2003  (IBM  Asceed,  CoolChips  VI  in  Japan,  etc.). 
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TECHNICAL  OVERVIEW 


2.A  PICONODE  I  -  PICORADIO  TEST 


PicoNode  I  -  PicoRadio  Test  Bed  Boards 


2.A.1  PicoRadio  Test  Bed  Hardware  and  Development  System 

2. A.  1 . 1  Architecture  and  i  mplementation 

Authors:  Fred  Burghardt  and  Susan  Metiers 

In  order  to  enable  real-world  investigation  into  system-level  aspects  of  a  PicoRadio  network 
before  the  PicoNode  devices  are  available  (and  also  to  help  determine  how  a  PicoNode  should  be 
designed),  a  prototyping  environment  was  built.  This  environment  is  referred  to  as  PicoRadio  I, 
or  the  PicoRadio  Test  Bed. 

The  PicoRadio  Test  Bed  is  a  collection  of  hardware  and  the  algorithms  that  run  on  the  nodes. 
Each  node  is  composed  of  two  major  parts:  a  set  of  custom  circuit  boards  and  a  collection  of 
software  libraries  that  allow  Pico  Radio  designers  to  make  use  of  the  hardware.  The  boards  are 
small,  stackable  units  that,  when  assembled,  fit  into  a  custom  case  designed  by  students  from  the 
Dept  of  Mechanical  Engineering  at  UCB.  The  PicoRadio  Test  Bed  is  composed  of  two  core 
boards:  the  digital  board  and  the  power  board.  The  various  boards  comprising  a  PicoRadio  Test 
Bed  are  shown  in  Figure  2. 

The  digital  board  contains  a  Strong  ARM  1100  embedded  microprocessor  and  a  Xilinx 
XC4020XLA  Field  Programmable  Gate  Array  (FPGA).  The  ARM  is  used  to  emulate 
functionality  that  may  be  mapped  into  a  general-purpose  processor  or  DSP  core.  It  provides  a 
CPU  core  and  a  variety  of  controllers  for  services  such  as  standard  I/O  control  and  timers.  The 
FPGA  is  used  to  emulate  tasks  that  would  be  assigned  to  configurable  or  custom  logic  on  a  Pico 
Node. 
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The  power  board  provides  power  to  the  digital  board,  and  contains  an  auxiliary  5v  supply. 
Dynamic  voltage  scaling  is  used  to  the  ARM  1.5v  core. 

In  addition  to  the  core  boards,  a  Test  Bed  node  includes  a  radio  board  and,  optionally,  a  sensor 
board.  A  Bluetooth  radio  is  the  RF  front  end  for  the  Test  Bed,  because  it  models  the  short  range 
of  the  PicoRadio  III  nodes. 

Sensor  board  I  was  designed  in  cooperation  with  the  Center  for  the  Built  Environment  (CBE), 
part  of  the  Dept  of  Architecture  at  UCB.  Three  of  the  sensors  on  the  board  are  types  most  likely 
to  be  found  in  a  Smart  Building  sensor  network:  temperature,  humidity,  and  light  intensity.  The 
board  also  contains  a  microphone  and  speaker  driver,  which  are  intended  to  be  part  of  an 
acoustic  anemometer  capable  of  measuring  very  low  levels  of  air  movement  inside  a  building 
(Karalar  2002).  This  sensor  board  has  been  used  extensively  in  data  collection  activities  here  at 
the  Center  and  for  system  demonstrations  at  BWRC  retreats. 

Sensor  board  II  was  designed  in  cooperation  with  the  Dept  of  Civil  Engineering  at  UCB.  The 
motivation  was  to  provide  a  means  of  instrumenting  earthquake  simulations  on  structures.  For 
this  purpose,  the  board  contains  a  two-axis  accelerometer  and  a  two-axis  magnetometer.  The 
magnetometer  is  primarily  used  to  provide  orientation  for  the  accelerometer  data  so  that  node 
positioning  is  not  critical.  As  an  exercise,  inclinometer  and  compass  applications  were  designed 
to  test  the  board;  both  use  the  nodes  status  LEDs  as  a  display.  These  applications  also  provide  for 
interesting  demonstrations.  A  GPS  circuit  was  included  in  the  board  design,  but  as  of  now  the 
boards  have  not  been  populated  for  GPS. 


Digital  Board: 

mapping  of  protocol 
layers 


Sensor  Board  I: 

sound,  light,  humidity, 
and  temperature 
sensors 


Sensor  Board  II: 

accelerometer  & 
magnetometer 


Bluetooth  Radio 
Board 


Figure  2 


An  ARM/FPGA  development  infrastructure  has  been  created  to  support  this  hardware.  Design 
environments  for  both  the  processor  and  the  FPGA  are  currently  in  use.  The  ARM  environment 
includes  project  management,  code  composition,  debugging,  and  compilation.  The  FPGA 
environment  includes  schematic  capture,  VHDL  composition,  simulation,  synthesis,  and 
program  file  compilation. 

For  the  ARM,  a  “kernel”  has  been  developed  that  provides  easy  access  to  resources  such  as  the 
interrupt  controller,  timers,  power  control,  a  real-time  clock,  general-purpose  I/O,  serial  ports, 
and  a  port  abstraction  for  FPGA  I/O.  The  kernel  also  contains  data  structure  packages  and 
support  for  pre-built  FPGA  circuit  blocks.  For  the  FPGA,  a  set  of  blocks  are  available  that 
provide  functions  such  as  ARM  I/O,  Tx/Rx  data  paths,  FIFOs,  a  TDMA  MAC,  and  mappings  for 
all  I/O  pins. 
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Full  system  deployment  of  sixty  PicoRadio  test  bed  nodes  has  been  completed.  Protocol 
development  and  test  results  are  reported  below. 


2.A.1.2  Concurrent  design  of  electrical  and  mechanical  components 

Authors:  Dan  Odell  and  Michael  Montero 

A  case  is  required  to  protect  the  PicoNode  I  physical  components.  The  node  consists  of  four 
printed  circuit  boards  (PCBs),  two  batteries,  battery  contacts,  a  power  switch,  an  antenna,  a  case, 
a  lid,  and  two  edge  windows  for  access  to  connectors.  Many  of  these  components  have  both 
mechanical  and  electrical  requirements  that  they  must  fulfill.  Designers  of  both  the  enclosure  and 
the  circuit  boards  must  participate  in  the  selection  and  design  processes  to  ensure  that  the 
components  meet  the  requirements  of  the  entire  system  and  not  solely  those  of  the  electrical  or 
mechanical  domain. 

In  order  to  facilitate  this  collaborative  design  process,  a  unified  domain  design  environment  is 
being  developed  to  address  the  needs  of  electro-mechanical  product  designs.  The  tool  called 
DUCADE  (Domain  Unified  Computer  Aided  Design  Environment)  enables  designers  from  the 
electrical  and  mechanical  domains  to  exchange  pertinent  design  information  throughout  the  life 
cycle  of  the  product  design.  DUCADE  allows  PCB  design  and  development  to  occur 
concurrently  with  the  mechanical  design  of  the  product’s  enclosure.  Issues  such  as  thermal 
conductivity,  geometric  interference,  and  IC  component  placement  are  dealt  with  between  both 
domain  designers  to  promote  parallel  product  design  which  will  reduce  the  iterations  of  re-design 
and  hence  lower  cost  and  time. 

An  enclosure  was  designed  for  the  PicoRadio  Test  Bed  “stack”  and  a  prototype  was  created  on 
Mechanical  Engineering  Department’s  Fused  Deposition  Modelling  (FDM)  machine.  A 
production  run  of  150  cases  was  completed,  and  50  fully  assembled  nodes  are  now  available  for 
deployment.  Figure  3  shows  a  typical  production  PicoRadio  Test  Bed  case,  and  Figure  4  shows 
an  exploded  view. 


Node  with  final  case.  Sensor  board  is  on  top.  Case  with  lid  and  boards  removed. 

Figure  2 
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Power  PCB 


Dummy  PCB 
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Figure  4:  Exploded  view  of  case  with  lid  at  bottom,  to  show  battery  socket  area 


2.A.2  PicoNode  III  implementation  on  test  bed 

Author:  Johnathan  Reason 

With  the  exception  of  the  Physical  layer,  the  entire  PicoNode  III  protocol  stack  has  been 
integrated  into  PicoNode  I.  Over  the  past  twelve  months,  we  have  subjected  this  protocol  stack  to 
extensive  testing  and  debugging,  which  has  led  to  some  important  functional  refinements, 
particularly  in  the  Datalink  and  Network  layers.  Additionally,  we  have  developed  a  network 
management  system  that  allows  us  to  monitor  and  manage  the  performance  of  our  network.  At 
the  BWRC  retreat  in  June  2002,  we  made  our  debut  demonstration  of  an  ad  hoc,  multi-hop, 
sensor  network  using  the  PicoNode  III  protocols  running  on  the  Test  Bed  hardware.  Based  on 
lessons  learned  from  this  demonstration,  subsequent  demonstrations,  and  lab  testing,  we  have 
achieved  a  low-energy,  stable,  and  robust  protocol  stack. 
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2.A.2.1  Protocol  integration 

Integrating  the  PicoNode  III  protocols  into  the  Test  Bed  hardware  was  a  major  milestone  for 
three  reasons.  First,  this  was  the  first  time  all  the  layers  of  the  protocol  stack  were  brought 
together  on  a  single  platform.  Thus,  the  functional  verification  of  inter-layer  semantics  and  inter¬ 
node  communications  actually  took  place  in  the  Test  Bed.  Secondly,  testing  the  protocol  stack  in 
a  real  world  environment  allowed  us  to  identify  some  shortcomings  in  the  functional  behavior  of 
each  layer,  especially  regarding  robustness.  Lastly,  the  Test  Bed  implementation  turned  out  to  be 
a  better  starting  point  for  the  PicoNode  III  system  on  a  chip  (SoC)  implementation  than 
anticipated. 

Currently,  the  PicoRadio  network  in  the  Test  Bed  contains  three  types  of  nodes:  sensors, 
controllers,  and  anchors.  Sensor  nodes  gather  and  forward  sensor  measurements.  Controller 
nodes  primarily  initiate  commands  to  the  network  (e.g.,  requests  for  sensor  measurements)  and 
serve  as  the  end  destination  where  sensor  nodes  forward  their  responses.  Anchor  nodes  provide 
static  position  references  within  the  environment  by  periodically  broadcasting  their  location  to 
the  network.  Controller  and  anchor  nodes  typically  have  a  hard-wired  power  source  and  can 
optionally  be  configured  to  gather  and  forward  sensor  measurements  too.  Additionally,  controller 
nodes  are  physically  connected  to  a  computer  via  a  serial  cable.  In  a  typical  deployment  scenario, 
per  room  there  might  be  one  controller,  at  least  four  anchors,  and  thirty  or  more  sensors. 

The  figures  and  table  below  illustrate  how  each  layer  is  mapped  onto  the  Test  Bed  hardware  and 
what  primary  components  and  functions  comprise  each  layer.  With  the  exception  of  the  sensor 
boards,  the  Application  layer  is  implemented  in  software  that  is  executed  on  the  StrongARM 
processor.  The  Network  layer  is  solely  implemented  in  software;  however,  it  is  important  to  note 
that  this  is  only  because  we  had  very  limited  reconfigurable  resources  in  the  FPGA.  To  provide 
better  power  management,  some  parts  of  the  Network  layer  will  be  implemented  in  hardware  in 
the  SoC  platform  of  PicoNode  III. 


2.A.2.2  Application  layer 

The  Application  layer  consists  of  one  standard  sensor  board  (Sensor  Board  I),  one  optional 
sensor  board  (Sensor  Board  II),  the  Monitor  software  module,  and  application  drivers  that 
provide  the  interface  between  the  Application  and  Network  layers.  Each  node  is  equipped  with  at 
least  Sensor  Board  I,  which  provides  a  microphone  and  temperature,  light  and  humidity  sensors. 
Some  nodes  are  also  equipped  with  the  second  board  that  allows  for  more  advanced  applications 
like  motion  detection  via  an  accelerometer  and  magnetometer.  The  Monitor  is  a  small  software 
module  that  interacts  with  the  real  world  through  user  applications  that  control  and  monitor  the 
network.  The  application  drivers  interpret  incoming  packets  as  controls  that  activate  the  various 
sensors,  as  well  as  assemble  outgoing  sensor  measurements  and  Monitor  requests  into  packets. 


Figure  5 


2.A.2.3  Network  layer 

The  Network  layer  is  comprised  of  four  macro  components:  the  Energy  Aware  Routing 
Algorithm,  the  Location  Service,  the  Neighbor  List  Service  and  the  Queuing  Service.  The 
Energy-Aware  Routing  Algorithm  is  the  primary  function  of  the  Network  layer  and  its  details 
were  described  in  the  PicoRadio  Year  Two  report  for  2001  (see  Section  2C.1.1).  The  Location 
Service  is  comprised  of  the  algorithm  and  protocol  that  allows  each  node  to  dynamically 
discover  its  position  relative  to  anchor  nodes  (see  details  in  2001  report  Section  2C.2.1).  In  the 
PicoRadio  network,  a  nodes  location  is  analogous  to  the  concept  of  network  address  found  in 
most  networks.  Therefore,  through  the  remainder  of  this  report,  we  will  use  the  terms  location 
and  network  address  interchangeably.  Thus,  the  Location  Service  is  considered  a  sub  layer  of  the 
Network  layer  because  it  provides  the  means  by  which  a  PicoNode  dynamically  configures  its 
network  address.  The  Queuing  Service  provides  the  interface  between  the  Network  and  Datalink 
layers. 
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Layer 

Macro  Component 

Primary  and  Secondary  Functions 

Application 

Sensor  Boards  I  &  II 

Monitor  Module 

Application  Drivers 

1 .  Gather  sensor  measurements 

2.  Request  sensor  measurements 

3.  Assemble/disassemble  control  and  measurement 

packets 

Network 

Energy- Aware  Routing  Algorithm 
Location  Service 

Neighbor  List  Service 

1.  Next-hop  data  routing/broadcast  forwarding 

2.  Dynamic  location  configuration 

3.  Neighbor  address  resolution 

a.  neighborhood  maintenance 

b.  dynamic  MAC  ID  configuration 

c.  initialization  management 

Datalink 

MAC 

MAC 

TX/RX  Datapaths 

1 .  Access  control 

2.  Link  reliability  control 

3.  Datapath  control  and  flow 

Physical 

Radio 

1 .  Transmit  and  receive  bits 

Table  1:  Components  and  Functions  of  each  layer  in  the  PicoRadio  protocol  stack 


The  Neighbor  List  Service  (NLS)  performs  address  resolution  for  the  Network  layer,  a  function 
that  is  commonly  found  in  most  networks  (e.g.,  the  Address  Resolution  Protocol  in  the  Internet 
protocol  suite).  It  accomplishes  this  by  maintaining  a  table  that  contains  a  mapping  between  its 
one-hop  neighbors’  media-access  (MAC)  IDs  and  network  addresses.  Each  entry  in  this  table 
also  includes  other  information  useful  for  routing  such  as  a  link  quality  metric  and  a  status 
indicator.  In  a  typical  query,  the  routing  algorithm  may  provide  the  NLS  with  a  location  and 
receive  back  the  triplet  (Status,  MAC  ID,  Link  Metric).  The  NLS  also  manages  the  timing  of 
events  during  the  initialization  process,  which  consists  of  discovering  the  neighborhood, 
computing  its  location,  configuring  its  MAC  ID,  and  joining  the  neighborhood. 

One  important  contribution  of  the  Test  Bed  is  the  major  refinement  of  the  NLS.  Extensive  testing 
showed  that  the  performance  of  routing  is  strongly  dependent  on  the  maintenance  strategy  of  the 
neighbor  list  table.  For  example,  early  versions  of  the  NLS  added  and  removed  a  neighbor  to  its 
table  with  only  the  notion  of  how  frequently  it  heard  (or  didn’t  hear)  special  control  messages 
from  a  neighbor.  This  approach  proved  to  lack  robustness  in  a  real  world  scenario  because  it  did 
not  really  capture  link  quality.  In  the  current  approach,  a  node  only  adds  (or  removes)  a  neighbor 
when  the  quality  of  the  link  between  itself  and  its  neighbor  has  been  tested  and  the  link  metric  is 
above  (or  below)  an  acceptable  threshold.  Additionally,  we  refined  the  layering  position  of  this 
macro  component,  which  was  originally  considered  to  be  a  sub  layer  of  the  Datalink  layer.  The 
new  layering  helped  us  maintain  modularity  and  a  simple  interface,  which  greatly  facilitated 
debugging. 


2.A.2.4  Datalink  layer 

The  Datalink  layer  is  comprised  of  three  macro  components:  the  Transmit  Controller  and 
Datapath  (TCD),  the  Receive  Controller  and  Datapath  (RCD),  and  the  Media  Access  Controller 
(MAC).  The  TCD  and  RCD  interface  with  the  Queuing  Service  of  the  Network  layer  and  control 
the  datapath  functions:  transmit/receive  buffering,  serialization/de-serialization,  cyclic 
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redundancy  checking,  and  line  balancing.  The  PicoRadio  Test  Bed  MAC  supports  the  following 
features: 


•  Broadcast 

•  Request  to  Send  (RTS)  /Clear  to  Send  (CTS)  style  unicast  data  transfers  with  medium 
reservation 

•  Receiver  duty  cycling 

•  Two-channel  or  multi-channel  configuration 

The  receiver  duty  cycling  feature  or  cycled-receiver  is  a  concept  borrowed  from  paging  systems. 
It  is  widely  used  in  many  MAC  designs  (e.g.,  802.1 1  sleep  mode)  as  a  way  to  reduce  the 
receiver’s  idling  power  consumption.  The  idea  is  to  turn  the  transceiver’s  idle  mode  into  a  low- 
power  sleep  mode  by  periodically  duty  cycling  the  receiver,  as  opposed  to  leaving  the  receiver 
on  100%  of  the  time. 

The  Test  Bed  Bluetooth  radios  support  64  channels  in  the  frequency  range  from  2.402  GHz  to 
2.480  GHz.  In  the  two-channel  configuration,  we  use  one  channel  to  send  broadcast  messages 
and  another  channel  to  send  unicast  messages.  In  the  multi-channel  configuration,  we  still  only 
use  one  channel  for  broadcast  messages,  but  we  employ  orthogonality  in  frequency  for  unicast 
messages.  In  particular,  each  node  receives  unicast  messages  on  a  locally  independent  channel 
that  corresponds  to  its  MAC  ID.  For  example,  a  node  with  MAC  ID  24  will  receive  unicast 
messages  on  channel  24.  MAC  IDs  range  from  1  to  63. 


2.A.2.5  Low-power  features 

Each  layer  of  the  protocol  stack  has  components  that  incorporate  specific  energy-aware  or  low- 
power  features  (e.g.,  medium  reservation  in  the  MAC).  In  addition,  each  macro  component  has 
been  designed  to  support  a  power  management  interface,  which  can  be  used  to  turn  components 
off  when  they  are  idling.  Since  there  is  no  way  to  turn  off  the  power  to  individual  components  in 
the  Xilinx,  power  management  is  not  actually  implemented  in  the  Test  Bed.  However,  this 
feature  of  the  component  interfaces  will  be  used  by  the  power  manager  in  the  PicoNode  III 
implementation. 


2.A.2.6  Design  flow 

Figure  6  below  illustrates  the  Test  Bed  design  flow.  Most  of  the  design  was  captured  using 
language-based  tools.  All  the  application  drivers  and  network  functions  are  implemented  in  C 
code.  We  use  the  ARM  Compiler  to  target  the  SA-1 1 10  processor.  Most  of  the  control  functions 
in  the  Datalink  layer  are  written  in  HandelC,  which  is  a  hardware  language  for  concurrent 
programming.  It  is  based  on  the  Communicating  Sequential  Processes  (CSP)  programming 
language.  HandelC  uses  a  subset  of  ANSI  C  with  some  additional  syntactical  constructs  to 
support  hardware  design.  The  HandelC  compiler  can  produce  optimized  EDIF  2.0,  VHDL,  or 
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Verilog  targeted  for  a  specific  FPGA  (or  CPLD).  We  use  schematic  capture  primarily  to  specify 
design  connectivity.  We  also  use  it  partially  to  specify  the  interface  abstraction  between  the 
FPGA,  radio,  and  processor.  We  use  the  Xilinx  ISE  Foundation  tools  to  perform  all  the  other 
synthesis  steps  (e.g.,  mapping,  place  and  route).  The  FPGA  synthesis  process  is  fully  automated 
using  custom  scripts.  Once  a  Xilinx  image  is  synthesized,  we  use  the  ARM  Debugger  to 
simultaneous  load  the  image  and  the  ARM  executable  into  the  PicoNode’s  flash  RAM. 


C  code  for  Application  VHDL  and  Verilog  to  specify  HandeIC  code  for  control  Schematics  to  specify 
and  Network  layers  the  interface  abstraction  functions  in  Datalink  layer  design  connectivity 


Firmware 


HicoNode 


PicoNode  I  Design  Flow:  A  macro  component  view  of  the  programmable  Test  Bed  design  flow 


Figure  6 


Currently,  there  is  no  viable  simulation  engine  in  this  design  flow.  Thus,  all  design  verification 
and  refinement  is  based  on  real  hardware  and  real  world  experimentation.  Although  this  might 
not  be  the  ideal  approach,  we  found  it  to  be  the  most  expedient  approach.  All  of  the  tools  we 
investigated  for  our  design  flow  proved  to  be  inadequate  for  at  least  one  of  the  following  reasons: 

•  No  path  to  FPGA  or  ASIC  synthesis 

•  Did  not  adequately  simulate  the  intra-  and  inter-component  concurrency  and  inter-layer 
semantics 

•  Too  slow  in  simulating  multi-node  scenarios 

Recently  though,  researchers  at  our  center  have  developed  a  design  flow  that  shows  promise  in 
overcoming  these  shortcomings.  Their  design  flow  uses  MATLAB  (i.e.,  Simulink  and 
StateFlow)  for  design  capture  and  simulation.  From  these  high-level  descriptions,  FPGA  and 
ASIC  synthesis  is  possible.  We  are  currently  experimenting  with  this  flow  for  part  of  the 
PicoNode  III  implementation.  We  are  currently  experimenting  with  this  flow  for  part  of  the 
PicoNode  III  implementation. 
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2.A.2.7  Network  management  and  maintenance 

To  facilitate  testing,  debugging,  and  performance  measuring,  we  designed  a  network 
management  subsystem  that  we  call  the  Statistics  and  Management  Service  (SMS).  SMS  is  an 
independent  subsystem  that  can  optionally  be  enabled  on  each  PicoNode  I  system.  It  operates  by 
employing  a  request/response  paradigm  similar  to  the  interaction  between  controllers  and 
sensors.  When  SMS  is  enabled,  a  PicoNode  can  be  configured  as  either  an  SMS  controller  or  an 
SMS  agent.  An  SMS  controller  sends  requests  for  management  data  and  SMS  agents  respond 
with  the  data.  Typically,  controller  nodes  are  configured  as  SMS  controllers  and  sensor  nodes  as 
SMS  agents.  The  management  data  is  a  table  of  variables  maintained  by  each  SMS  agent.  An 
SMS  controller  can  request  a  single  variable  (e.g.,  the  number  of  CRC  failures)  or  the  entire  table 
from  a  single  node  or  a  group  of  nodes.  The  management  data  contains  the  following  entries: 

•  The  number  of  RTS/CTS  counts  per  data  session 

•  CRC  failure  counts 

•  Packet  header  information 

When  management  data  arrives,  an  SMS  controller  will  log  it  in  a  file  for  off-line  processing. 

The  component  view  of  SMS  is  depicted  in  the  right  half  of  Figure  7.  SMS  is  comprised  of  four 
components:  the  Manager,  Recorder,  CRC  Counter,  and  Session  Counters.  The  Manager  is  a 
software  component  that  implements  the  SMS  controller/ agent  functionality.  When  a  node  is 
configured  as  an  SMS  controller,  the  Manager  provides  the  interface  to  a  user  interface  program 
that  initiates  requests.  When  a  node  is  configured  as  an  SMS  agent,  the  Manger  configures  the 
other  components  to  service  SMS  controller  requests.  The  Recorder  maintains  the  management 
data,  which  it  records  on  a  per  packet  basis.  It  can  record  all  ingoing  and  outgoing  packets  or 
only  packets  of  a  specific  type.  The  Recorder  is  used  when  an  SMS  Agent  is  configured  to 
periodically  send  management  data  to  an  SMS  controller.  For  single  responses,  the  Manager  can 
optionally  read  counter  values  directly  through  the  Counter  Control  Service,  which  provides  the 
interface  to  access  the  CRC  and  Session  Counters.  All  inbound  and  outbound  data  must  pass 
through  the  SMS  Packet  Drivers  to  be  unpacked  or  packaged  into  packets. 

SUMMARY  OF  PERFORMANCE 

Using  SMS  for  management  data  collection  and  Excel  filters  for  off-line  processing,  we  can 
make  a  variety  of  performance  measurements,  including: 

•  Broadcast  packet  loss  rate  (PLR) 

•  RTS/CTS  session  success  rate 

•  Bit  error  rate  (BER) 

•  Power  estimates 

•  Route  traces 

•  Response  latency 
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Figure  7 


The  figures  below  illustrate  a  sample  of  the  performance  measurements  that  can  be  extracted 
from  SMS.  For  each  figure,  1000  measurements  were  taken  per  data  point  with  the  controller 
node  requesting  management  data  at  a  rate  of  5  packets  per  second.  For  the  cycled-receiver 
MAC,  the  cycle  period  was  100  milliseconds  with  a  25%-duty  cycle.  Each  node  was  equipped 
with  a  6  dB  attenuator  to  limit  the  transmit  range  to  a  few  meters. 

Figure  8  shows  how  the  broadcast  packet  loss  rate  (PLR)  varies  with  distance.  We  compare  the 
results  of  a  two-node  and  five-node  neighborhood,  where  each  node  implements  the  multi¬ 
channel,  non-cycling  MAC  model  described  above  (see  Datalink  layer  section).  For  the  two-node 
neighborhood,  we  have  one  controller  and  one  sensor  and  vary  the  distance  between  them  from 
20  to  200  centimeters.  For  the  five-node  neighborhood,  we  have  one  controller  node  and  four 
sensors  nodes.  The  first  three  sensors  are  spaced  at  20  centimeter  increments  from  the  controller 
and  the  fourth  sensor  is  varied  from  20  to  200  centimeters  from  the  controller.  This  figure 
illustrates  how  just  a  few  intermediate  forwarding  nodes  can  dramatically  improve  the  broadcast 
reliability. 
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Packet  loss  rate  for  broadcast  messages  in  a  2-node  and  5-node  neighborhood  as  a  function  of  distance 
between  the  controller  node  and  a  particular  sensor  node.  Forwarding  by  intermediate  nodes  dramatically 
improves  the  broadcast  reliability. 


Figure  8 


In  Figure  9,  we  consider  the  five-node  neighborhood  again,  but  this  time  we  compare  the  multi¬ 
channel,  non-cycling  MAC  to  a  25%-duty  cycling,  multi-channel  MAC.  This  figure  illustrates 
the  trade-off  between  cycling  the  receiver’s  idle  duty  cycle  (to  conserve  power)  and  broadcast 
reliability.  The  25%-duty  cycled  MAC  has  a  PLR  three  to  five  times  worse  than  the  non-cycled 
MAC.  However,  note  that  intermediate  forwarding  also  improves  the  25%-duty  cycled  MAC, 
This  suggest  that  greater  density  might  be  able  to  compensate  for  much  of  this  trade-off.  We  are 
still  trying  to  verify  this  hypothesis. 

In  Figure  10,  for  the  same  five-node  neighborhood,  we  consider  the  impact  varying  distance  has 
on  the  performance  of  unicast  data  transfers  (or  sessions).  These  results  are  for  the  multi-channel, 
non-cycling  MAC.  To  complete  a  unicast  session,  it  takes  a  minimum  of  three  messages:  one 
ready-to-receive  (RTS),  one  clear-to-send  (CTS),  and  one  data  transmission  (DTX).  Up  to  60 
centimeters,  about  90%  of  all  unicast  sessions  complete  with  the  minimum  number  of  messages. 
This  is  region  of  the  graph  is  favorable  because  it  also  indicates  the  minimum  session  setup 
latency  and  the  minimum  session  power  consumption.  Beyond  60  cm  the  rate  of  success  with 
just  three  total  messages  drops  off  rapidly.  However,  up  to  1.2  meters,  at  least  88%  of  all  unicast 
sessions  complete  with  six  or  less  total  messages. 
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A  comparison  of  the  broadcast  packet  loss  rate  (PLR)  for  100%  duty  cycle  receiver  and  a  25%  duty  cycle 
receiver.  Duty  cycling  the  receiver  during  idle  time  degrades  broadcast  reliability. 

Figure  8 
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Cumulative  number  of  messages  in  a  data  session  (#RTS+#CTS+#DTX) 


Five-node  scenario:  one  controller  and  four  sensors.  The  rate  of  success  for  unicast  data  sessions  for  a 
given  number  of  cumulative  messages  at  varying  distances  form  the  controller. 

Figure  10 
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These  results  are  useful  because  they  allow  us  to  tune  the  time  constants  of  our  RTS/CTS  style 
MAC  for  different  topology  and  deployment  scenarios.  Additionally,  the  same  management  data 
can  be  used  to  extract  the  average  power  consumption  and  the  average  response  latency  per 
node.  Excel  filters  are  currently  under  development  to  extract  these  results. 

In  Figure  11,  we  compare  the  unicast  session  performance  of  the  multi-channel,  cycled  and  non- 
cycled  MACs  to  the  two-channel,  cycled  and  non-cycled  MACs.  For  this  comparison,  we 
consider  the  five-node  neighborhood  with  the  furthest  most  sensor  node  60  centimeters  from  the 
controller.  These  results  show  that  the  cycled-receiver  and  the  two-channel  MAC  have  the 
biggest  impact  on  session  performance  for  three  to  five  message  sessions.  There  is  negligible 
impact  for  sessions  that  complete  using  six  or  more  messages.  In  contrast,  when  these  two  MACs 
are  combined,  there  is  at  least  a  4%  performance  penalty  for  all  sessions. 

These  examples  demonstrate  the  utility  of  SMS,  but  by  no  means  are  they  exhaustive.  SMS  is  a 
powerful  network  management  tool  that  is  still  maturing. 


Rate  of 
Success 


Cumulative  number  of  messages  in  a  data  session  (#RTS+#CTS+#DTX) 

Five-node  scenario:  one  controller  and  four  sensors,  with  the  furthest  most  node  60  cm  form  the  controller.  A 
comparison  of  the  rate  of  success  for  different  MAC  techniques. 

Figure  1 1 


2.A.2.8  PicoRadio  Test  Bed  deployment 

Authors:  Johnathan  Reason  and  Fred  Burghardt 

The  Test  Bed  has  been  deployed  in  various  forms  and  for  various  purposes  for  over  the  past  two 
years. 
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SUMMARY  OF  EXPERIMENTS  AND  TEST  RESULTS 


RSSI  Profiling:  One  early  experiment  attempted  to  determine  the  nature  of  the  wireless 
environment  within  an  area  of  the  BWRC.  Two  nodes  equipped  with  Proxim  RangeLAN  radios 
were  connected  using  TDMA.  Measurements  were  taken  at  various  intervals  and  the  results 
compared.  These  tests  showed  a  periodic  fading  with  distance,  consistent  with  a  multi-path 
environment.  The  results  confirmed  an  initial  assumption  about  the  space. 


Local  Positioning  (Locationing):  Two  experiments  were  performed  using  an  algorithm  based 
on  least-squares  triangulation  using  received  signal  strength  indication  (RSSI),  a  notoriously 
error-prone  number  (50%  accuracy).  In  the  first,  experiment,  one  “target”  node  attempted  to 
locate  itself  based  on  information  from  four  “anchor”  nodes.  The  anchors  were  pre-programmed 
with  fixed  XYZ  coordinates.  Results  from  this  experiment  were  interesting  but  less  than 
spectacular.  The  target  node  could  reliably  detect  movement,  but  its  idea  of  absolute  position  was 
generally  poor.  In  the  second  experiment,  demonstrated  at  the  Winter  2001  BWRC  retreat,  the 
number  of  anchor  nodes  was  increased  to  eight.  The  results  were  significantly  better  than  the  first 
experiment,  confirming  that  redundancy  is  required  for  this  algorithm  using  RSSI  as  a  distance 
metric. 


Sensoring:  PicoRadio  networks  will  initially  be  used  in  sensoring  applications.  To  test  an 
application  layer  for  sensing  light,  temperature,  and  humidity,  a  sensor  board  was  built  and  a 
series  of  experiments  conducted  at  BWRC.  In  these  experiments,  a  user  requested  information 
from  the  network  via  a  graphical  user  interface  running  on  a  “controller”  node.  The  requests 
were  forwarded  across  a  Test  Bed  network  to  sensor  nodes  placed  at  various  points  throughout 
the  center.  These  nodes  would  take  the  measurement  requested  and  return  the  data  to  the 
controller.  The  controller  node  logged  the  data  to  a  file  on  a  laptop  via  the  serial  port,  and  a 
separate  program  generated  real-time  or  batch-oriented  graphs.  This  application  was 
demonstrated  at  the  Summer  2001  BWRC  retreat. 


Networking:  The  most  demanding  use  of  the  Test  Bed  is  to  emulate  the  PicoRadio  network. 
Network  routing  in  and  of  itself  is  complex  and  difficult  to  analyze  and  debug.  To  aid  in  this 
task,  various  data  gathering  mechanisms  were  embedded  into  the  Test  Bed  implementation  of  the 
PicoRadio  protocols.  One  use  of  the  information  gathered  is  to  map  routing  activity  in  the 
network.  A  graphical  user  interface  has  been  designed  to  display  the  current  location  of  nodes  in 
a  deployment,  the  type  of  node,  the  type  of  information  requested  of  returned,  and  the  route  this 
information  took  on  the  return  path  to  a  controller.  The  GUI  shows  a  physical  space  with  colored 
circles  representing  nodes;  color  indicates  node  type  (sensor  or  controller).  The  circles  ‘flash’ 
when  receiving  or  sending  data.  A  text  box  adjacent  to  the  circle  displays  XYZ  coordinates, 
subtype  of  node  (temperature,  etc),  and  last  value.  Lines  between  circles  indicate  data  transmitted 
between  nodes.  The  display  is  dynamic;  lines  appear  and  disappear  as  data  flows  through  the 
network.  The  activity  can  be  recorded  and  replayed  at  a  later  time  at  various  speeds  and  in  both 
directions.  This  application  was  demonstrated  first  at  the  Summer  2002  BWRC  retreat  and  then 
at  the  June  2002  PAC/C  PI  meeting  in  Pittsburgh,  Pa.  A  related  demo  showing  network  statistics 
such  as  retransmit  counts  and  CRC  failures  on  a  line  graph  similar  to  the  sensoring  application 
was  shown  at  the  Winter  2002  BWRC  retreat. 
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Compass  and  Inclinometer:  Two  sensor  boards  were  built  for  the  Test  Bed.  The  first  contains 
temperature,  light,  humidity  sensors  and  a  microphone.  This  board  was  used  in  the  sensoring  and 
audio  direction  finding  studies.  The  second  contains  a  two-axis  accelerometer  and  a  two-axis 
magnetometer.  Two  applications  were  designed  that  used  the  sensors  on  board  #2:  one 
application  used  the  accelerometer  to  determine  inclination  from  the  horizontal,  displayed  on  the 
eight  status  LEDs  mounted  on  the  digital  board.  The  second  application  used  the  magnetometer 
to  implement  a  compass,  where  the  headings  were  also  displayed  on  the  LEDs.  These 
applications  were  used  mainly  for  entertaining  demos.  No  one  so  far  has  used  them  for  back- 
country  expeditions  as  far  as  we  know. 


Acoustic  Anemometer:  Sensor  board  #1  was  originally  designed  with  a  specific  experiment  in 
mind.  Air  flow  through  a  space  can  be  detected  by  variations  in  an  audio  signal  passing  through 
the  space.  The  sensor  board  contains  a  speaker  for  producing  tones  and  a  microphone  for 
detecting  the  tones.  Signal  processing  on  the  received  tones  can  be  done  in  the  FPGA  and 
processor  to  measure  the  rate  of  flow  of  air  along  the  axis  connecting  the  nodes. 

The  Test  Bed  was  used  for  a  series  of  experiments  in  acoustic  anemometry.  Results  were 
published  in  Karalar  (2002). 


Audio  Direction  Finding:  A  team  from  University  of  Illinois,  Urbana-Champaign  spent  two 
weeks  at  the  BWRC  conducting  data  gathering  to  evaluate  an  algorithm  to  locate  an  object  using 
sound.  Results  are  pending. 


2.A.2.9  Graphical  user  interfaces 

Several  GUIs  were  developed  to  aid  in  development  of  the  protocols.  Two  were  data  entry  tools 
and  two  were  display  tools. 

Figure  12  shows  the  Controller  Input  Panel.  We  use  this  tool  to  formulate  and  send  requests  for 
data.  It  provides  for  a  “program”  of  five  requests,  some  of  which  can  be  repetitive  as  indicated 
by  the  L  Reps  and  L  Secs  columns  on  the  right. 

Figure  13  shows  the  SMS  Input  Panel.  This  GUI  is  much  like  the  Controller  Input  Panel  except 
that  rather  than  requests  for  data  this  tool  handles  Statistics  and  Management  Service  requests. 

Figure  14  shows  the  Topology  Mapping  Tool.  The  green  circle  represents  a  controller  and  the 
text  to  the  right  of  the  circle  show  characteristics  of  a  request  for  data  that  was  just  sent.  The 
yellow  circles  show  sensors  that  have  responded  in  the  past.  Adjacent  to  a  sensor  circle  is  text 
related  to  a  sample.  The  orange  line  between  node  3,4,1  and  the  controller  indicates  that  a  data 
transfer  is  in  progress. 
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Figure  1 1 :  The  Controller  Input  Panel 


Figure  12:  The  SMS  Input  Panel 
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Figure  14:  The  Topology  Mapping  Tool 
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Figure  15  is  a  display  of  environmental  data  taken  over  about  nine  hours  at  the  BWRC.  The  three 
windows  show  simultaneous  temperature,  humidity,  and  light  readings.  The  lines  in  each  graph 
represent  nodes. 
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Figure  15:  Display  of  environmental  data  at  BWRC  over  a  nine-hour  period 


2.B  PICONODE  II  -  TWO-CHIP  PICONODE 
IMPLEMENTATION 

Authors:  M.  Josie  Ammer,  Michael  Sheets,  and  Mika  Kuulusa 

The  PicoNode  II  protocol  stack  is  realized  with  two  custom  ICs:  Baseband  Processor  (BBP) 
implementing  the  PHY  and  Wireless  Protocol  Processor  (WPP)  implementing  the  DLL  layer  and 
above.  The  interface  between  them  is  designed  to  be  simple  with  no  external  components  so  that 
a  future  revision  could  integrate  them  onto  one  chip.  Figure  16  shows  a  block  diagram  of  the 
system.  Each  chip  and  its  design  methodology  are  described  below  followed  by  testing, 
conclusions  and  results. 
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Figure  16:  PN  II  system  block  diagram 


2.B.1  Baseband  processor  (BBP) 

The  physical  layer  is  made  compatible  with  a  commercially  available  RF  front-end  (performing 
down  conversion  from  the  carrier),  ADC,  and  DAC.  Although  the  commercial  components  have 
high  power  consumption  resulting  from  their  tight  design  specs,  the  PHY  accommodates 
significantly  relaxed  specs  for  eventual  integration  with  a  custom,  low-power  analog  front  end 
(for  instance,  by  only  requiring  a  free-running  clock  with  50  ppm  accuracy).  The  chip  integrates 
all  other  PHY  receiver  and  transmitter  functions,  such  as  carrier  detect,  timing  recovery, 
synchronization,  and  detection. 

The  air-interface  uses  direct  sequence  spread  spectrum  (DSSS)  with  a  length  31  spreading  code 
at  25  Mcps  (Million  Chips  per  Second)  and  QPSK  modulation  resulting  in  a  raw  data  rate  of  1.6 
Mbps.  DSSS  was  selected  to  combat  narrow  band  fading.  QPSK  modulation  is  chosen  for  its 
ease  of  low  power  implementation  with  DSSS.  A  25  Mcps  chip  rate  provides  the  raw  data  rate  of 
1.6  Mbps  needed  to  support  the  twenty  64Kbps  TDMA  slots  specified  by  the  protocol.  The 
primary  receiver  specifications  are  a  +/-  lOOKHz  maximum  carrier  frequency  offset  (+/-50  ppm 
from  a  2GHz  carrier  reference),  5  dB  minimum  input  signal-to-noise  ratio  at  the  ADC,  and  a 
50ppm  ADC  sample  clock.  The  BBP  supports  a  typical  indoor  frequency-selective  wireless 
channel  with  mobile  units  traveling  at  foot  speeds. 
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High-level  system  exploration  in  Simulink  enabled  algorithm  refinement  and  power  optimization 
of  the  physical  layer.  A  block  diagram  is  shown  in  Figure  17.  The  RX/TX  Controller  state 
machine  interfaces  with  the  WPP  and  controls  the  flow  of  the  data  from  one  datapath  block  to 
another.  The  BBP  incorporates  5  gated  clock  domains  that  are  adaptively  switched  on  by  the 
RX/TX  Controller  for  maximal  energy  efficiency.  Communication  with  the  WPP  is  carried  out 
through  a  7-wire  Physical-to-Protocol  Interface  (PPI)  for  RX/TX  data  and  a  system  bus  interface 
for  initialization. 


I/Q  RX 
Stream 
Input 


l/Q  TX 
Stream 
Output 


Figurel7:  Baseband  Processor  (BBP)  block  diagram 


During  receive,  the  baseband  signal  is  sampled  by  an  off-chip  8-bit  ADC  at  100  Msps  (4  samples 
per  chip).  This  100  MHz  stream  is  split  into  4  parallel  streams  of  25  MHz  each  so  that  the  BBP 
could  operate  off  the  slower  25  MHz  chip  clock  reducing  power  by  allowing  a  lower  operating 
voltage.  Parallel  filter  techniques  are  used  to  process  these  four  streams  with  an  interpolation 
filter  to  increase  the  receiver  timing  resolution  to  8  samples  per  chip.  Performing  on-chip 
interpolation  of  the  signal  is  lower  power  than  running  the  ADC  at  twice  the  rate. 

Performing  timing  recovery  in  two  successive  stages  reduces  power  consumption  of  this  function 
by  a  factor  of  two.  First,  the  coarse  timing  block  performs  carrier  detect,  and  estimates  timing  to 
within  3/8  chip.  Then,  the  fine  timing  block  estimates  timing  to  within  1/8  chip  and  estimates  the 
carrier  frequency  offset  to  within  2.5  Hz. 

The  rotate  and  correlate  block  corrects  the  frequency  offset,  correlates  the  incoming  signal  with 
the  spreading  code,  and  performs  early/late  detection  to  track  the  optimal  timing  instant.  The 
correlated  symbols  are  fed  into  the  phasor  locked  loop  (PhLL)  where  the  phase  error  is  corrected 
using  feedback  and  the  QPSK  symbols  are  demodulated.  Where  possible,  coefficients  were 
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restricted  to  factors  of  two,  so  that  shift-and-add  operations  could  be  used  instead  of  the  more 
power  hungry  multiplication  operations. 

In  the  transmit  mode,  data  bits  are  mapped  into  QPSK  symbols,  spread,  raised-cosine  filtered, 
and  passed  to  an  off-chip  DAC.  The  transmitter  datapath  has  a  dual-channel  spreader  and  two 
25-tap  raised-cosine  filters  (alpha  =  0.30). 

The  BBP  has  several  features  that  facilitate  testing,  including  a  full  scan  chain.  Although  the  chip 
supports  a  programmable  spreading  code,  hard-wired  codes  are  used  during  test  to  reduce  setup 
complexity.  A  loopback  mode  connects  the  transmitter  output  stream  to  the  receiver  input  stream 
on  the  same  chip,  while  the  transmitter  output  pins  are  converted  into  a  64-bit  test  bus  during 
receiver  testing.  Internal  receiver  signals  such  as  the  RX/TX  controller  state,  code  matched  filter 
outputs,  frequency  estimate,  and  soft  symbols,  are  output  to  this  bus  to  aid  in  testing  and  debug. 


2.B.1.1  BBP  design  methodology 

The  design  flow  of  the  datapath-dominated  BBP  allows  high-level  design  exploration  using 
MATLAB/Simulink  dataflow  diagrams.  The  most  efficient  architectures  in  terms  of  power  and 
area  can  be  obtained  by  directly  mapping  these  dataflow  algorithms  into  hardware. 
Computational  energy  and  area  efficiencies  that  can  be  achieved  with  this  approach  are  2  to  3 
orders  of  magnitude  higher  than  the  efficiency  achieved  by  software  processors  (Brodersen 
1997).  In  this  way,  the  maximum  parallelism  can  be  obtained,  allowing  the  minimum  clock  rate 
and  supply  voltage  to  be  used,  resulting  in  reduced  energy  per  operation  (Chandrakasan  and 
Brodersen  1995).  High-level  power  estimation  and  successive  refinement  are  achieved  with 
parameterized  modules  programmed  in  Synopsys  Module  Compiler.  An  in-house  back-end 
design  flow,  called  SSHAFT  (Davis,  et  al.  2002),  allows  a  direct  path  from  Simulink  and  Module 
Compiler  to  heavily  parallelized,  direct-mapped  ASIC  implementations. 

Since  the  entire  design  is  encapsulated  in  Simulink,  it  can  be  simulated  along  with  models  of 
analog  front-end.  Therefore,  the  effect  of  analog  nonidealities  and  fixed-point  computation  can 
be  evaluated  at  a  system  level.  Extensive  system  level  simulations  were  done  to  ensure  proper 
operation  over  the  range  of  channel  and  circuit  nonidealities.  For  instance,  Figure  18  shows  the 
locking  behaviour  of  the  PhLL. 

The  SSHAFT  flow  enables  early  exploration  for  architectural  tradeoffs  between  power 
consumption,  speed,  and  die  area.  Fixed-point  Simulink  library  models  correspond  to 
parameterized  arithmetic  units  designed  in  Module  Compiler.  These  modules  can  be  quickly 
compiled  with  given  parameters  to  form  a  gate-level  netlist  from  which  accurate  power 
estimations  can  be  made.  For  instance,  four  microarchitectures  are  available  for  complex 
multiply-accumulate  (MAC)  operations,  as  illustrated  in  Figure  19.  The  flow  can  be  used  to 
quickly  decide  which  microarchitecture  results  in  the  lowest  power  for  the  particular  input  and 
output  bit  widths  required  by  the  algorithm. 
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Figure  19:  Simulation  and  matching  chip  test  results  of  the  BBP  PhLL  output 


Figure  19  (a)  through  (d):  Four  architectures  implementing  a  complex  MAC  operation 
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The  SSHAFT  flow  also  automates  the  physical  design  process.  Simulink  ‘enable  signals’  are 
converted  into  gated  clocks  for  automatic  clock  tree  generation.  This  reduces  power  by 
eliminating  the  switching  activity  when  a  block  is  not  in  use.  While  the  BBP  is  datapath 
dominated,  some  control  is  still  required  to  steer  the  data  through  the  datapath  blocks  and  to  turn 
on  and  off  the  gated  clock  domains.  Controllers  are  described  as  state  machines  in 
Simulink/Stateflow  and  they  are  automatically  translated  to  VHDL  and  merged  into  the  design. 

The  BBP  uses  a  hierarchical  floorplan  consisting  of  two  levels  of  hierarchy.  Each  block  at  the 
lower  level  is  placed  and  routed  separately  and  then  connected  at  the  top  level.  Trial  runs  done 
on  a  preliminary  netlist  showed  that  hierarchical  place  and  route  resulted  in  18%  smaller  area 
than  flat  place  and  route.  Switch-level  simulations  of  the  extracted  layout  were  conducted  in 
PathMill  to  ensure  that  the  design  met  timing  over  all  corners.  Hierarchical  physical  design, 
parallelized  datapaths,  and  reduced  clock  speeds  facilitate  quick  timing  closure  and  allow  the 
design  to  meet  timing  constraints  on  the  first  pass. 


2.B.2  Wireless  protocol  processor  (WPP) 

The  WPP  is  an  energy-efficient  realization  of  the  DLL,  transport,  session,  and  application 
protocol  layers.  A  block  diagram  is  depicted  in  Ligure  20.  To  attempt  to  reduce  design  time  and 
leverage  existing  work,  the  WPP  design  is  an  experiment  in  integrating  custom  logic  with 
commercial  IP  blocks.  The  architecture  is  a  system-on-chip  design  consisting  of  multiple  cores 
connected  by  a  system  bus.  The  main  components  on  the  chip  are  a  Sonics  Silicon  Backplane 
system  bus  that  connects  a  Tensilica  T1030  Xtensa  RISC  microprocessor  with  64/64kbyte 
instruction/data  memories,  a  Protocol  Processing  Engine  (PPE),  and  various  interface  units. 
Each  block  is  described  below. 

The  use  of  a  standard  interface  to  the  interconnect  network,  as  enabled  by  Silicon  Backplane 
network  architecture  from  Sonics,  facilitates  the  realization  of  correct  communication  between 
multiple  cores  on  a  die.  The  Silicon  Backplane  is  a  pipelined  system  bus  arbitrated  with  a  hybrid 
time-division  and  round-robin  scheme  (Wingard  2000).  By  adopting  a  standard  protocol  (OCP), 
this  approach  simplifies  the  design  process  by  clearly  defining  the  interface  between  blocks. 
Additionally,  the  standard  interface  supports  easy  core  reuse,  which  can  reduce  time  to  market 
for  future  projects. 

The  use  of  a  standard  interface  to  the  interconnect  network,  as  enabled  by  Silicon  Backplane 
network  architecture  from  Sonics,  facilitates  the  realization  of  correct  communication  between 
multiple  cores  on  a  die.  The  Silicon  Backplane  is  a  pipelined  system  bus  arbitrated  with  a  hybrid 
time-division  and  round-robin  scheme  (Wingard  2000).  By  adopting  a  standard  protocol  (OCP), 
this  approach  simplifies  the  design  process  by  clearly  defining  the  interface  between  blocks. 
Additionally,  the  standard  interface  supports  easy  core  reuse,  which  can  reduce  time  to  market 
for  future  projects. 
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Figure  20:  Wireless  Protocol  Processor  (WPP)  block  diagram 


The  Xtensa  microprocessor  performs  system  initialization  and  allows  flexible  implementation  of 
the  application,  session,  and  transport  protocol  layers.  Analysis  of  the  processor  requirements 
shows  that  a  modest  12.5  MHz  system  clock  is  sufficient,  which  reduces  power  by  lowering  the 
switching  frequency  and  allows  the  use  of  a  1.0V  core  supply  voltage.  The  Xtensa  is  chosen 
because  it  supports  design-time  customizations,  which  allow  the  selections  of  data  path 
components  and  memory  hierarchy  to  be  tailored  to  match  requirements.  This  results  in  a  lower 
power  solution,  because  it  prevents  wasted  power  due  to  over-design. 

The  application-specific  PPE  block  efficiently  implements  the  DLL  and  the  mu-law  companding 
logic.  Data  to  be  transmitted  comes  from  either  the  control  messages  sent  by  the  transport  layer 
in  software  or  from  the  audio  data  path.  The  audio  data  path,  including  the  mu-law  compander,  is 
implemented  entirely  in  hardware  for  efficiency. 

Additional  custom  logic  is  included  for  off-chip  interfaces  to  the  BBP,  a  Xilinx  LPGA,  and  an  8 
Mbit  flash  memory.  The  BBP  interface  allows  software  programmability  of  the  base  band 
spreading  code.  Using  the  LPGA  interface,  the  WPP  can  configure  a  Xilinx  Virtex  chip  using  the 
slave-serial  programming  mode.  The  flash  memory  interface  allows  the  Xtensa  software  and 
Xilinx  configuration  code  to  be  stored  and  automatically  booted  during  system  initialization  as 
shown  in  Ligure  21. 
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PicoNode  II  -  BootMonitor  vl.3a,  FPGH  vl.7 
Comwcind>  ? 

Cow wand  list: 

X  -  Reset  the  FPGfl 
R  -  Read  a  memory  address 
W  -  Write  a  memory  address 


P  -  Dump  the  protocol  parameters 
C  Dump  the  node  configuration  parameters 
F  Reset  to  factory  defaults 
1  -  Set  the  remote  ID 
T  -  Test  protocol  program 
!  -  Download  old  boot  fpga  image 

1  Upload  a  new  boot  fpga  image 

2  -  Upload  a  new  boot  sw  image 

3  -  Upload  a  new  application  fpga  image 

4  -  Upload  a  new  application  sw  image 

5  -  Reboot  the  boot  image 

B  -  Boot  the  application  image 
?  -  This  help  screen 

Command>  B 

Booting  application  sw... 


PicoNode  II  -  WPP  TEST  vl.l 

I  am  a  basestation,  setting  diagnostic  mode  2... done 
Installing  interrupt  handler .. .done 
Enabling  interrupts. . .done 
Waiting  for  event  from  a  remote... 


Connected  0:01:41  VT100  US2008-N-1 


Figure  21 :  Screen  shot  of  WPP  boot  configuration  menu  and  base  station  application 


The  remaining  interfaces,  including  the  RS-232  serial  port,  a  JTAG  test  access  port  (TAP),  and 
special  manufacturing  test  port,  are  used  for  system  testing  and  debugging.  The  serial  port 
interface  is  used  to  output  an  activity  log  and  to  download  software  upgrades.  The  TAP  allows 
an  external  debugger  to  observe  and  control  the  Xtensa  by  setting  breakpoints  and  single¬ 
stepping  the  software  code.  A  special  test  mode  allows  detection  of  manufacturing  faults  by 
converting  19  pins  of  a  data  bus  into  a  port  that  controls  access  to  the  on-chip  scan-chains  and 
Built-In-Seif-Test  (BIST).  The  BIST  uses  the  Marinescu  17N  algorithm  to  detect  faults  in  the 
memory  arrays. 
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2.B.2.1  WPP  design  methodology 

The  design  flow  of  the  control-dominated  WPP  is  based  on  system-level  exploration  between 
hardware  and  software  implementation  trade-offs.  The  protocol  stack  includes  45  Co-design 
Finite  State  Machines  (CFSMs)  because  of  their  suitability  for  modelling  control  systems  while 
maintaining  support  for  datapath  operations  without  requiring  assumptions  on  the  underlying 
implementation.  The  CFSMs  are  captured  and  simulated  within  the  Cadence  VCC  tool. 

The  simulation  model  within  VCC  is  iteratively  refined  as  part  of  the  implementation  process. 
The  initial  simulation  allows  the  designer  to  focus  on  the  correct  operation  of  the  algorithm 
sequences  by  abstracting  time.  Once  the  correct  operation  is  verified,  the  functions  are  mapped 
onto  target  architectures.  Architectural  models  for  the  Xtensa  and  custom  logic  allow  estimation 
of  actual  delays  to  be  included  in  the  system-level  simulation.  Alternative  functional  mappings 
onto  hardware  architectures  are  evaluated  to  identify  an  implementation  that  minimizes  power 
consumption  while  meeting  the  timing  and  flexibility  requirements.  The  conceptual  protocol 
stack  and  final  partitioning  is  shown  in  Figure  22.  For  most  functions,  hardware  realizations  are 
favored  for  their  energy-efficiency  but  high-level  protocol  layers  are  mapped  into  software  to 
allow  changes  to  the  user-interface  and  communication  channel  allocation  algorithm.  Energy 
consumption  is  optimized  by  identifying  an  architecture  that  meets  the  design  requirements 
without  surplus,  such  as  finding  the  minimum  allowable  Xtensa  clock  frequency.  In  addition,  the 
parameters  for  the  Xtensa  design-time  customizations  are  based  upon  the  results  of  the 
architectural  exploration. 


User  Display/Controls  Voice  Samples 


Figure  22:  Protocol  stack  functions  and  hardware/software  partitioning 
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Once  a  suitable  architectural  mapping  is  chosen,  the  functions  are  implemented  according  to  the 
target  implementation  type.  Software  code  is  generated  directly  from  the  VCC  description. 
Custom  hardware  is  implemented  through  manual  translation  to  synthesizable  RTL  Verilog  code. 
A  complete  Verilog  implementation  is  realized  after  incorporation  of  the  code  for  the  Xtensa, 
Silicon  Backplane,  and  interface  logic. 

Three  levels  of  simulation  ensure  the  correct  implementation  of  the  WPP  before  physical 
implementation.  First,  each  core  on  the  chip  is  simulated  independently  within  a  Verilog 
simulator  to  verify  the  functionality  and  check  compliance  to  the  standard  interface.  The  correct 
operation  of  a  node  is  checked  using  a  Seamless  co-simulation  of  the  software,  running  on  an 
instruction  set  simulator  (ISS),  and  the  custom  hardware  in  the  Verilog  simulator.  Once  the  co¬ 
simulation  operates  correctly,  the  RTL  code  for  the  Xtensa  is  substituted  for  the  ISS  to  verify  the 
processor  interface.  The  co-simulation  step  is  preferred  for  early  simulations  due  to  its  enhanced 
software  debugging  features  and  reduced  simulation  run-times.  A  system  test  consisting  of  a  base 
station  and  two  remotes  is  used  to  check  correct  operation  of  the  TDMA  protocol. 

The  resulting  Verilog  code  is  implemented  using  an  industry-standard  timing-driven  digital 
design  flow.  Power  consumption  is  reduced  using  a  high  low-leakage  standard  cell  library 
during  synthesis.  After  floor  planning,  placement,  and  routing,  the  extracted  layout,  including 
parasitics,  is  verified  through  static  timing  analysis  and  switch-level  simulation. 


2.B.3  Testing  and  results 

The  BBP  and  WWP  ASICs  are  implemented  in  a  triple -well,  0.1 8um  digital  CMOS  process  with 
6  metal  layers.  The  600k-transistor  BBP  chip  has  a  core  and  pad-limited  die  area  of  2.2mmT  and 
14.5mm2,  respectively.  The  1.3M  transistor  WPP  has  a  die  area  of  17.6  mm2.  The  die 
micrographs  are  shown  in  Figure  23  and  Figure  24.  Approximately  3500  lines  of  application  and 
initialization  C-code  are  compiled  onto  the  Xtensa  processor. 

The  BBP  and  WPP  chips  are  fitted  on  a  system  board  that,  in  combination  with  a  2.4GHz 
RD0310  radio  board,  form  the  test  system.  A  significant  portion  of  the  test  board  is  used  to  ease 
testing  of  the  two  chips,  so  a  production  prototype  would  be  significantly  smaller.  A  photograph 
of  the  test  board  is  shown  in  Figure  25. 

The  system  board  includes  a  Xilinx  (XCV300E  series),  an  on-board  ADC  and  DAC  to  interface 
with  the  radio,  power  supply  regulators,  crystal  oscillators,  and  other  configuration  circuitry  and 
test  headers.  The  BBP  and  WPP  have  separate  power  supply  domains  from  the  rest  of  the  board, 
so  that  the  core  power  consumption  could  be  measured. 
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Figure  23 :  WPP  die  micrograph. 
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Figure  25 :  Test  board  for  entire  system. 


The  analog  quadrature  baseband  I/O  of  the  2.4GHz  radio  board  is  attached  directly  using  an 
array  of  SMA  barrel  adapters.  Digital  PLL  programming  and  other  RF  control  of  the  radio  board 
is  carried  out  by  the  FPGA.  The  system  board  also  supports  digital  RX  gain  control  and  manual 
TX  power  control.  The  board  has  8  copper  layers  and  the  size  is  12"xl4".  The  power 
consumption  of  the  PN  II  prototype  is  depicted  in  Figure  26.  During  the  active  mode,  the  average 
current  drawn  from  a  7V  voltage  supply  is  370mA  (2.6W).  In  the  active  mode,  the  RF  front-end 
(50/50  RX/TX  duty  cycle),  FPGA,  and  data  converters  consume  76%  (2.0W)  of  the  overall 
power.  The  main  reason  for  the  converters  is  that  the  converters  are  3.3V  parts  and  they  are 
sampling  at  100MHz.  The  remaining  0.6W  are  divided  into  voltage  regulator  loss  (180mW),  the 
clock  generation/distribution  (1.65mW),  MP3  audio  decoder  (83mW),  and  other  circuitry,  such 
as  RS-232  line  driver,  1.8/3.3V  level  conversion,  and  LEDs.  Total  power  consumed  in  active 
state  is  2.6W. 
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Figure  26 :  PicoNode  2  prototype  power  consumption  by  section 


The  Xilinx  on  the  test  board  can  be  programmed  to  perform  many  testing  functions  including 
acting  as  a  pattern  generator.  For  the  BBP,  a  MATLAB  program  automatically  converts  the 
Simulink  test  vectors  into  initialization  commands  for  the  Xilinx  block  memories,  and  the 
vectors  are  fed  to  the  chip  during  test.  The  chip  outputs  were  captured  by  a  logic  analyzer 
(FIP16702A),  and  compared  with  the  expected  outputs  from  the  Simulink  simulation  (as  shown 
in  Figure  27  for  the  PhLL  outputs).  For  the  WPP,  a  loopback  mode  is  implemented  in  the  Xilinx 
to  emulate  the  bit  stream  from  the  BBP.  The  correct  operation  of  the  PPI  interface  is  verified  by 
comparing  the  logic  analyzer  traces  with  the  expected  values  from  the  RTL  simulations.  All  chip 
outputs  and  test  bus  signals  were  verified  operational  vs.  the  expected  simulated  results. 

The  BBP  chip  consumes  14  mW  on  average  when  receiving  a  short  packet  consisting  of  a  40- 
symbol  synchronization  word  and  20  data  symbols.  Longer  packets  have  lower  average  power 
consumption  because  the  high  power  consumption  during  the  fixed-length  synchronization  word 
is  averaged  over  a  longer  payload.  During  idle  mode  (TX  and  RX  off),  the  chip  consumes  less 
than  1  mW.  When  three  nodes  are  connected  to  the  network,  the  BBP  is  in  transmit  mode  for 
one  slot  and  receive  mode  for  2  slots,  so  the  duty  cycle  is  15%.  Under  these  conditions,  the 
expected  power  consumption  of  the  BBP  is  3  mW.  The  WPP  chip  consumes  10  mW  on  average 
when  three  nodes  are  connected  to  the  network.  The  actual  power  consumption  varies  depending 
on  whether  the  node  is  a  remote  or  a  base  station  and  the  number  of  slots  a  remote  is  monitoring. 
A  summary  of  BBP  and  WPP  statistics  is  given  in  Table  2 
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Figure  27 :  Simulation  and  matching  chip  test  results  of  the  BBP  PhLL  output 


BBP 

WPP 

Process 

Triple-well,  0.1 8u  digital  CMOS, 
with  6  metal  layers 

Transistors 

600  K 

1.3  M 

Area 

Core:  2.2  mm" 

Die:  14.5  mm" 

Die:  17.6  mm2 

Package 

208-pin  PGA 

208-pin  PGA 

Core  power  supply 

IV 

1  V 

I/O  voltage 

1.8  V 

1.8  V 

Clock  Frequency 

25  MHz 

12.5  MHz 

Average  power  during  system 
operation  (3  nodes  in  network) 

3  mW 

(15%  duty  cycle) 

10  mW 

Table  2:  BBP  and  WPP  statistics 
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2.B.4  Lessons  learned 


Several  lessons  were  learned  from  the  implementation  of  the  BBP  chip.  In  future  designs,  the 
analog  front-end  should  be  included  in  the  physical  layer  design  methodology.  Recent  work  from 
the  PicoRadio  project  (Schuster  2002)  shows  that  large  energy  savings  can  be  realized  in  the 
analog  front-end  by  making  use  of  novel  RF  circuits.  The  SSHAFT  flow  should  be  modified  to 
produce  a  gate  level  netlist  of  the  entire  design.  Because  the  high-level  description  was  mapped 
directly  to  layout,  chip-level  verification  must  be  done  through  slow  transistor  level  simulations. 
This  prohibits  extensive  verification  of  low-level  details,  such  as  interactions  of  gated  clock 
domains.  Whenever  possible,  output  pins  should  be  converted  to  dual  use  test  pins.  The  BBP’s 
64-bit  test  bus  proved  useful  to  verify  that  the  chip  was  being  clocked  properly,  that  power  was 
reaching  the  chip,  and  that  the  chip  was  correctly  inserted  into  the  socket.  Indeed,  this  bus 
proved  invaluable  in  quickly  identifying  a  bonding  error  that  occurred  during  the  packaging 
process. 

Lessons  from  the  WPP  chip  indicate  that  high-level  design  methodologies  can  help  to  identify  an 
architecture  that  exactly  meets  the  design  requirements  and  minimizes  power  consumption.  To 
select  the  correct  architecture,  good  models  of  the  IP  must  be  available.  Without  reliable  models, 
the  system  must  be  over-designed  to  ensure  correct  operation  in  the  presence  of  this  uncertainty. 
Another  lesson  is  that  the  perceived  functionality  of  a  system  can  comprise  only  a  fraction  of  the 
architecture  because  interface  and  test  logic  must  be  considered.  Reusing  existing  or  purchasing 
commercial  IP  can  minimize  the  additional  design  time  required  for  these  blocks. 

Interconnection  of  these  blocks  is  simplified  by  conforming  to  a  standardized  interface,  similar  to 
that  used  by  the  Silicon  Backplane. 


2.C  PICONODE  III  -  ULTRA-LOW  POWER  PICONODE 

2.C.1  Protocol  stack  for  PicoRadios 

2.C.1.1  Low  energy  ad  hoc  networking  for  PicoRadio 

Author:  Rahul  Shah 

The  goal  of  the  PicoRadio  project  is  to  build  a  wireless  sensor  network  that  is  versatile,  self¬ 
organizing,  dynamically  reconfigurable,  and  multi-functional.  With  the  primary  constraint  at  the 
nodes  being  the  extremely  low  energy  budget,  the  network  layer  has  to  route  packets  intelligently 
to  maximize  the  network  lifetime  or  the  survivability  of  the  network. 

We  had  shown  previously  that  routing  packets  along  the  lowest  energy  paths  is  not  optimal  for 
the  network  lifetime.  A  new  probabilistic  forwarding  scheme  was  proposed  that  uses  a  set  of 
routes  between  source  and  destination  in  a  probabilistic  fashion.  This  reduces  the  problem  of 
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hotspots  and  uneven  energy  usage  across  the  network.  However,  to  further  improve  the  routing 
efficiency,  it  is  necessary  to  reduce  the  overhead  involved  in  forwarding  packets. 

In  particular,  we  considered  the  problem  of  minimizing  the  amount  of  communication  needed  to 
send  readings  from  a  set  of  sensors  to  a  single  destination  in  energy  constrained  wireless 
networks.  Substantial  gains  can  be  obtained  using  packet  aggregation  techniques  while  routing. 
The  routing  algorithm  we  developed,  called  Data  Funnelling,  allows  the  network  to  considerably 
reduce  the  amount  of  energy  spent  on  communication  setup  and  control,  an  important  concern  in 
low  data-rate  communication.  This  is  achieved  by  sending  only  one  data  stream  from  a  group  of 
sensors  to  the  destination  instead  of  having  an  individual  data  stream  from  each  sensor  to  the 
destination.  This  strategy  also  decreases  the  probability  of  packet  collisions  when  transmitting  on 
a  wireless  medium  because  incorporating  the  information  of  many  small  packets  into  few  large 
ones  reduces  the  total  number  of  packets. 

Additional  gains  can  be  realized  by  efficient  compression  of  data.  This  is  achieved  by  losslessly 
compressing  the  data  by  encoding  information  in  the  ordering  of  the  sensors’  packets.  This 
“coding  by  ordering”  scheme  compresses  data  by  suppressing  certain  readings  and  encoding 
their  values  in  the  ordering  of  the  remaining  packets.  Using  these  techniques  together  can  more 
than  halve  the  energy  spent  in  communication. 

We  also  explored  the  effect  of  altruists  in  the  network.  Altruists  are  nodes  that  have  a  higher 
amount  of  energy  than  most  of  the  other  nodes  in  the  network  and  offer  to  route  packets  due  to 
their  higher  energy.  We  simulated  a  network  where  such  altruists  help  in  forwarding  packets  to 
see  if  that  helps  in  increasing  the  network  lifetime.  Although  in  some  cases  it  helped,  in  most 
cases  we  observed  that  bottlenecks  still  occur  in  the  network  due  to  the  neighbors  of  the  altruists 
burning  a  lot  of  energy.  Thus  the  network  deployment  needs  to  be  carefully  done  to  use  such  a 
scheme  effectively. 


2.C.1.2  PicoNode  MAC  and  topology  control 

Author:  Chunlong  Guo 

This  research  covers: 

1.  Low  power  MAC  protocol:  simulation  and  verification  model  in  Omnet  and  SDL 

2.  Mobility  support  /  Dynamic  Addressing 

3.  Topology  Control:  Zone  Based  Topology  Control 

Recent  progresses  in  the  first  area  include  a  more  complete  comparative  study  of  the  proposed 
protocol  and  existing  protocols.  A  subset  of  the  protocol  has  been  implemented  in  the  PicoNode 
Test  Bed,  which  revealed  some  potential  problems  in  the  original  design.  A  more  realistic 
channel  model  has  been  incorporated. 

A  novel  architecture  with  a  static  network  skeleton  plus  mobile  nodes  using  a  different  address 
space  has  been  proposed.  This  will  avoid  the  stability  problem  caused  by  frequent  trigger  of 
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channel  reassignment  due  to  mobile  nodes.  The  new  protocol  identifies  two  kinds  of  mobile 
nodes:  wanderer  and  transporter.  The  former  is  a  mobile  information  source,  while  the  latter  can 
be  an  information  collecting  agent. 

Substantial  progress  has  been  made  in  topology  control  protocol.  Most  of  the  existing  work  in 
this  area  only  considers  local  connectivity,  which  leads  to  algorithms  that  result  in  bad  global 
connectivity.  The  basic  framework  of  cone  based  topology  control  is  from  a  recent  study  in 
Microsoft  Research.  A  zone  based  topology  control  was  developed,  which  results  in  optimal 
global  connectivity  while  keep  the  computation  complexity  low.  The  basic  idea  of  the  new 
protocol  is  to  add  directional  information  into  the  local  connectivity  control  algorithm.  After  the 
initial  topology  setup,  every  node  in  the  network  has  roughly  same  numbers  of  neighbors  (3~6), 
and  the  lifetime  is  substantially  longer  (~2  times)  than  network  without  topology  control. 
Simulation  results  are  shown  in  Figure  28. 


« 

t 

• 

j . 

\  *  \ 

IMM*4 

tm*  RmH  •  a 

■  \  \ 

% 

! 

\  x 

fc _ V  ■. 

■  •irnu"*::: 

»  •  • 

Figure  28 :  Compares  the  number  of  nodes  alive  in  networks 
with  and  without  topology  control. 


2. C. 1.2.1  Implementation  of  data  link  layer  for  PicoNode  3 

Authors:  Lizhi  Charlie,  Zhong  Mei  Xu,  and  Jie  Zhou 

A  framework  has  been  developed  where  models  for  components  in  the  data  link  layer  are 
brought  together  with  models  for  the  network  layer  and  the  channel.  This  framework  includes  all 
the  factors  that  influence  the  design  of  the  data  link  layer  and  enables  us  to  study  the  design  of 
any  component  in  the  data  link  layer  in  the  context  of  other  components  and  layers.  As  a  case 
study,  models  have  been  built  for  commonly  used  MAC/Link  design.  Banach’s  fixed-point 
theorem  is  applied  to  solve  the  close  loop  problem  encountered.  Network  simulations  using 
OMNET++  verify  our  models.  Impact  of  some  important  parameters  has  also  been  studied. 

Data  link  layer  (DLL)  provides  self-initialization  of  PicoNodes,  maintenance  of  local  topology, 
forwarding  packets  to  and  from  upper  layers.  It  also  supports  error  control  and  controlling  the 
usage  of  the  physical  channels.  In  the  initialization  phase,  a  PicoNode  recognizes  and  adds  new 
neighboring  nodes  to  its  neighbor  list.  To  maintain  its  local  topology,  a  PicoNode  periodically 
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communicates  with  its  neighbors  and  updates  its  neighbor  list.  The  DLL  receives  packets  and 
then  dispatches  these  packets  to  the  network  layer,  the  localization  engine,  or  the  DLL  packet¬ 
processing  block.  It  also  adds  DLL  headers  to  out-going  packets.  The  transmit  and  receive  data 
paths  consist  of  packet  queues,  serializer/deserializer,  CRC,  memory  buffer,  line  balancer,  etc. 

The  MAC  sub-layer  supports  sleep  mode  of  the  physical  layer  by  using  a  cycled  receiver  with  a 
parameterized  duty  cycle.  The  physical  layer  provides  two  channels,  one  channel  for  the 
broadcast  and  the  other  for  unicast.  The  radio  listens  to  the  broadcast  channel  for  a  certain  period 
of  time,  and  then  it  goes  to  the  sleep  mode  to  save  power.  The  MAC  broadcasts  beacons 
repetitively  to  the  receiver  in  order  to  set  up  a  unicast  session.  A  CSMA  scheme  is  used  before 
each  transmission  to  decrease  chances  of  collisions. 

The  DLL  model  has  incorporated  new  features  such  as  power  management,  parameterization, 
and  statistics.  The  power  management  feature  allows  the  state  transition  diagrams  (STD)  to 
operate  at  the  minimum  power  consumption  level.  In  order  to  incorporate  the  power  management 
mechanism,  the  STDs  will  inform  the  system  supervisor  when  it  is  in  the  idle  state,  so  that  the 
system  supervisor  can  turn  them  off.  Before  an  STD  is  turned  off,  some  state  variables  will  be 
exported  to  registers  located  outside  of  the  STD,  so  the  present  status  of  the  STD  could  be 
retained  for  future  use.  When  an  STD  wants  to  communicate  with  a  block  in  another  power 
domain,  it  notifies  the  system  supervisor  to  turn  on  that  power  domain  first.  Parameterization 
makes  DLL  programmable  at  run  time  and  it  is  used  for  debugging  purposes.  Timer  values  and 
packet  types  are  parameterized,  so  that  they  can  be  set  in  software.  This  implies  that  these  values 
can  be  programmable  on  the  fly.  The  current  design  is  able  to  collect  statistics  such  as  network 
traffic  patterns  and  packet  error  rate,  this  information  can  be  used  to  further  understand  the 
sensor  network  behaviors  and  provide  feedback  on  its  improvement. 

Current  design  has  been  done  in  Matlab  Simulink/Stateflow  development  environment.  Control 
blocks  are  implemented  in  Stateflow,  and  the  data  paths  are  in  Simulink  building  blocks.  Xilinx 
blocks  are  used  for  synthesis  efficiency.  SSHAFT/BEE  design  flow  is  used  to  translate  the 
design  into  VHDL,  and  eventually  down  to  ASIC,  completing  the  first  step  towards  making  a 
fully  functional  PicoNode3  chip. 


2.C.1.2.2  Integrated  physical  and  link  layer  strategies  for  PicoRadio 

Author:  En-Yi  Lin 

Recent  advancement  in  wireless  communication  and  electronics  has  led  to  the  development  of 
sensor  networks.  In  sensor  networks,  unlike  in  ad  hoc  networks,  the  most  critical  factors  are  not 
bandwidth  efficiency,  packet  throughput  or  latency,  but  power  efficiency  and  scalability.  These 
different  emphases  make  the  design  choices  over  the  protocol  stack  in  sensor  networks  very 
different  from  that  of  ad  hoc  networks.  The  focus  of  my  research  is  to  analyze  the  tradeoffs 
between  low  power  and  acceptable  QoS  in  a  sensor  network. 

In  this  research,  a  power  model  including  both  the  physical  layer  and  data  link  layer  is  built.  On- 
Off  keying  is  the  modulation  scheme  assumed,  according  to  the  current  design  in  PicoRadio. 
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Various  wireless  channel  models  are  assumed.  Thereon,  power  consumption  in  actual  circuit 
components  as  well  as  from  a  communication  prospective  (SNR/BER)  are  considered.  Moving 
one  layer  upward,  different  MAC  protocols  are  designed  and  analyzed  considering  not  only 
throughput  and  delay,  but  also  the  support  needed  from  the  physical  layer,  for  example,  number 
of  channels,  data  rate,  etc.  Taking  circuit  complexity  into  account  when  considering  power 
consumption,  it  is  found  that  the  traditional  way  of  designing  the  communication  system 
(modulation/demodulation,  MAC  protocol,  error  control  coding,  etc.)  does  not  necessarily  lead 
to  the  real  optimum  result.  It  is  the  purpose  of  this  research  to  integrate  across  both  the  physical 
and  data  link  layer  to  find  the  optimum  operating  point  in  terms  of  low  power  consumption  and 
performance. 


2.C.1.3  Design  methodology  for  PicoRadio 

Authors:  Rong  Chen  and  Marco  Sgroi 

Pico-radio  is  an  ad-hoc  sensor  network,  but  its  design  should  not  be  ad-hoc  at  all.  In  fact,  its 
design  has  been  following  the  platform-based  design  principle. 

In  the  beginning,  Pico-radio  is  conceptually  expressed  in  English,  and  then  it  is  transformed 
formally  into  UML  diagrams.  A  novel  UML  profile,  called  UML  Platform,  has  been  proposed  to 
fully  support  such  a  design  specification  capturing. 

Then,  the  UML  specification  is  further  transformed  into  the  Metropolis  meta-model,  where 
computation  and  communication  specifications,  function  and  architecture  specifications  are 
completely  orthogonal  to  facilitate  the  design  space  exploration  and  to  maximize  the  design 
reuse. 

Once  the  design  is  expressed  in  the  Metropolis  meta-model,  the  Metropolis  framework  provides 
tool  support  to  verify  if  certain  design  property  is  expressed  consistently  across  different  protocol 
layers,  and  if  certain  design  constraints  can  be  satisfied  by  the  current  protocol. 

Within  the  Metropolis,  functional  blocks  are  then  refined  and  mapped  onto  different  architecture 
blocks.  Such  a  mapping  is  not  unique,  and  with  certain  cost  metrics,  different  mappings  can  be 
compared  to  minimize  the  overall  cost. 

Finally,  once  a  mapping  has  been  chosen  as  the  most  "suitable"  implementation,  the  C  code  will 
be  synthesized  for  those  functional  blocks  mapped  into  software,  and  RTL  code  will  be 
synthesized  for  those  functional  blocks  mapped  into  hardware. 
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2.C.2 


Positioning  algorithms 


2.C.2.1  Determining  position  using  RF  phase  differences 

Author:  Tufan  Karalar 

Low  Power  locationing  systems  are  essential  parts  of  distributed  sensor  networks.  As  a  part  of 
PicoNode  3  digital  protocol  processing  chip,  a  locationing  block  is  being  implemented  on 
silicon.  Hop-counts  from  certain  sensor  nodes  (coined  anchors)  with  known  positions  are  utilized 
to  estimate  the  position  of  the  node.  The  tasks  of  the  block  include  executing  the  LS  position 
estimation  algorithm  also  called  triangulation,  as  well  as  encoding  and  decoding  the  Pico  Radio 
packets  that  contain  locationing  information. 

In  future  work  the  actual  distances  -  instead  of  hop  counts  -  between  nodes  are  to  be  measured 
using  radio  signals.  The  scheme  is  planned  to  utilize  the  time  of  flight  measurements  of  the  radio 
signals.  This  will  be  achieved  by  a  wideband  pseudo  noise  signal.  The  advantage  of  this  scheme 
is  given  enough  bandwidth  the  multipath  components  can  be  resolved.  Multipath  effect  is  one  of 
the  biggest  woes  of  indoor  distance  measurement  schemes  and  this  distance  measurement 
technique  has  robustness  Furthermore,  it  also  has  a  stake  at  low  power  implementation,  which 
can  make  it  attractive  for  a  low  power  sensor  network  vision. 


2.C.2.2  Localization  in  sensor  networks 

Author:  Jana  van  Greunen 

This  research  considers  the  problem  of  localization  in  low-cost,  wireless  sensor  networks. 
Localization  refers  to  the  process  by  which  the  nodes  in  a  sensor  network  discover  their 
geographical  location.  Localization  is  important  because  many  applications  of  sensor  networks 
rely  heavily  on  the  sensor  nodes’  ability  to  establish  position  information.  Chris  Savarese,  a 
former  student,  developed  a  localization  algorithm  for  PicoRadio  (Savarese  2002).  His  algorithm 
employs  range  measurements  between  pairs  of  nodes  and  a  priori  coordinates  of  at  least  three 
reference  nodes.  It  is  fully  distributed  and  requires  relatively  low  communication  and 
computation  energy  from  each  node  in  the  network. 

More  specifically,  the  localization  algorithm  has  two  stages:  start-up  and  refinement.  During  the 
start-up  phase,  the  Triangulation  via  Extended  Range  and  Redundant  Association  of  Intermediate 
Nodes  (TERRAIN)  is  initiated  at  each  node  in  the  network.  TERRAIN  provides  an  initial 
position  estimate  for  nodes  in  the  network  via  triangulation  based  on  the  number  of  hops  to 
different  reference  nodes.  After  a  node  has  completed  the  start-up  phase  it  enters  a  second  phase 
called  refinement.  In  the  refinement  stage,  each  node  in  the  network  iteratively  measures 
distances  to  its  one-hop  neighbors  and  calculates  a  new  position  estimate  using  weighted 
maximum-likelihood  estimation.  Results  have  shown  that  this  algorithm  is  capable  of  producing 
error  estimates  as  low  as  5%  in  the  presence  of  range  errors.  Figure  29  shows  the  average 
position  error  after  the  start-up  algorithm  for  one  hundred  simulated  networks,  each  with  400 
nodes  placed  randomly  in  rectangle,  and  5%  range  error. 
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Figure  29 :  Average  position  error  AFTER  hop-TERRAIN 
(5%  range  errors) 

For  the  same  simulation,  Figure  30  shows  the  position  error  after  refinement. 


Figure  30 :  Average  position  error  AFTER  refinement 
(5%  range  errors) 


Despite  promising  results,  the  convergence  speed  and  accuracy  of  the  algorithm  is  heavily 
dependent  on  the  topology  and  number  of  reference  nodes.  The  next  goal  of  this  project  is  to 
research  and  analyze  the  convergence  behavior  of  the  localization  algorithm  under  different 
network  conditions.  The  objective  is  to  increase  the  algorithm’s  robustness  and  incorporate 
features  such  as  current  error  estimation,  early  termination  when  error  estimates  are  low,  and 
better  convergence  guarantees. 
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2.C.3  PicoNode  III  implementation  strategies 

2.C.3.1  Low  power  operating  system  for  wireless  networks 

Author:  Suet-Fei  Li 

Part  of  this  project  goal  is  to  develop  an  efficient  OS  for  complex  real  time,  power-critical, 
reactive  systems  implemented  on  advanced  heterogeneous  architectures.  Event-driven  OS, 
developed  specifically  to  target  reactive  event-driven  systems,  is  much  more  efficient  than 
traditional  general-purpose  OS.  TinyOS,  an  existing  event-driven  OS,  offers  some  very  attractive 
concepts,  but  is  insufficient  to  fulfil  the  ambitious  management  role  demanded.  To  overcome  the 
limitations  of  TinyOS,  we  proposed  an  event-driven  hierarchical  power  management  framework 
as  shown  in  Figure  31.  The  hierarchical  structure  enhances  design  scalability,  supports 
concurrency  in  both  the  application  domain  and  architecture  and  enables  power  control  at 
various  granularities.  The  software  management  framework  implements  a  hybrid  power  control 
policy  that  consists  of  a  central  power  scheduler  and  distributed  control  units. 


Power  Scheduler 


Doraain2  ^  Domain3  ^ 


Figure  3 1 :  Hierarchical  Power  Management  Framework 


2.C.3.2  Leakage  current  management  in  deep  sub- micron  IC’s 

Author:  Huifang  Qin 

Deep-submicron  technology  in  current  and  future  integrated  circuit  design  leads  to  increasing 
leakage  energy  dissipation.  Effective  leakage  control  techniques  are  required  for  any  low  power 
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application.  With  large  density  of  transistors,  the  on  chip  memory  leakage  has  become  a 
significant  part  of  the  system  power  consumption.  A  study  exploring  ultra  low  standby  supply 
voltage  techniques  for  the  goal  of  memory  leakage  suppression  is  carried  out  in  both  simulation 
and  test  chip  fabrication. 

The  approach  proposed  is  dedicated  on  pushing  the  standby  supply  voltage  of  the  memory 
module  or  logic  with  hard  state  to  the  data  retention  limit.  Thus  the  information  in  the  memory 
preserved  while  the  leakage  power  is  effectively  reduced.  Simulation  results  showed  that  the 
technique  provides  promising  leakage  power  saving  (-90%),  acceptable  wake  up  delay,  ensured 
standby  data  preservation  and  controllable  operation  noise.  1KB  SRAM  test  chip  with  standby 
control  logic  has  been  implemented  in  0.13  pm  CMOS  and  will  be  tested  soon. 


Figure  32 :  1  KB  SRAM  test  chip 


2.C.3.3  PicoNode  III  system  implementation 

Author:  Mika  Kuulusa 

Figure  33  is  a  system  block  diagram  of  PicoNode  III,  or  Quark  node.  It  is  made  from  two  custom 
chips,  Strange  RF  and  Charm  digital  processor,  and  is  complemented  by  a  set  of  peripheral 
circuitry  for  non-volatile  storage,  clock  generation,  and  signal  conversion. 

The  PicoRadio  project  has  advanced  to  its  final  phase  in  which  all  research  efforts  will  be 
merged  and  combined  into  PicoNode  III,  designated  as  the  Quark  node  (Rabaeyet  al.  2002).  The 
Quark  node  is  an  ultra-low  power  wireless  sensor  node  that  contains  a  1.2"x2.0"  (30x50mm) 
system  board,  lithium-polymer  battery,  and  a  solar  cell.  In  addition,  the  system  board 
incorporates  supplementary  peripherals  for  temperature  sensing,  voltage  regulation,  clocking, 
and  non-volatile  program  storage.  The  Quark  board  is  based  on  the  chipset  comprising  of 
Strange  (analog  OOK  transceiver)  and  Charm  (digital  processor)  chips. 
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Figure  33 :  System  block  diagram  of  the  Quark  node 


2 

The  Strange  prototype  chip  (~2mnT,  0.13pm  CMOS)  combines  Film  Bulk  Acoustic  Resonator 
FBAR)  MEMS  components  with  CMOS  circuitry  to  generate  the  local  1.9GFIz  carrier 
frequency.  Compared  to  several  hundreds  of  microseconds  for  conventional  CMOS  oscillators, 
the  MEMS/CMOS  design  is  capable  of  powering  itself  on/off  and  stabilizes  in  approximately  0.3 
microseconds.  This  behavior  combined  with  simple  OOK  (On-Off-Keying)  modulation  allows 
us  to  use  energy-efficient  non-linear  power  amplifier  circuits  for  the  transmitter  section.  The 
integrated  transceiver  supports  10kbps  minimum  data  rates,  0  dBm  transmit  power,  -70  dBm 
sensitivity,  and  it  draws  3-4mW  from  a  IV  supply. 

The  Charm  chip  (~3mnT,  0.13pm  CMOS)  implements  the  digital  baseband,  data  link,  network, 
and  application  layers  of  the  PicoRadio  protocol.  The  baseband  and  Medium  Access  Control 
(MAC)  will  be  implemented  as  application-specific  hardware  using  FIDL  programming  and  also 
graphical  design  entry  in  the  form  of  Matlab/Simulink  descriptions  for  implementation  in  the 
SSFIAFT  design  flow  (Davis  et  al.  2001).  The  higher  layers  are  executed  in  a  synthesizable  8051 
microcontroller  (20kgates,  -2MIPS)  to  provide  implementation  flexibility  that  is  desirable  for 
user  applications  and  further  refinement  of  the  ad-hoc  network  routing  algorithms  (Shah  and 
Rabaey  2002).  In  addition,  the  Charm  contains  Localization  Engine  that  is  optimized  hardware 
for  performing  LMS-algorithm  and  triangulation  operations.  Because  the  standby  power 
consumption  in  0.13pm  CMOS  technologies  will  be  dominated  by  high  transistor  leakage 
currents,  the  Charm  chip  integrates  an  intelligent  power  controller  that  can  enable/disable  either 
the  digital  clock  or  power  supply  to  each  functional  unit  on  chip.  This  is  accomplished  with  a  set 
of  microcoded  event  sequences  for  various  states  of  the  protocol  stack.  Moreover,  the  Charm  will 
incorporate  a  switched-capacitor  voltage  regulator  providing  a  200mV  data  retention  voltage  for 
the  on-chip  SRAM  memories.  Prototype  chip  is  in  manufacturing  and  according  to  simulations 
this  method  will  reduce  idle  mode  power  consumption  of  the  memories  by  90%. 
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Two  energy-scavenging  options  have  emerged  to  be  feasible:  solar  cells  and  piezo-electric 
vibration  energy  converter.  In  the  targeted  indoor  office  environment,  solar  cells  provide  around 
lmW/cnT  and  piezo-electric  converters  deliver  up  to  0.  lmW  (in  1  cm3)  of  continuous  power. 

2.C.3.4  Algorithms  and  VLSI  implementations  of  low  power  digital 

baseband  timing  recovery  systems  for  wireless  communications 

Author:  M.  Josie  Ammer 

This  research  addresses  the  algorithms  and  implementations  for  digital  baseband  timing  recovery 
in  wireless  receivers.  Timing  recovery  refers  to  the  estimation  and  tracking  of  several  non¬ 
idealities  in  the  received  signal  caused  by  (1)  the  wireless  channel  itself,  and  (2)  the  RF  and 
analog  circuits  in  the  transmitter  and  receiver.  Parameters  to  be  estimated  include:  (1)  frequency, 
(2)  phase,  (3)  sampling  instant,  and  (4)  gain,  including  multipath  and  scattering  effects.  This 
research  looks  specifically  at  timing  recovery  performed  on  the  baseband  signal  (after  down- 
conversion  from  the  carrier)  in  the  digital  domain  (after  the  analog  to  digital  converter)  and  is 
particularly  concerned  with  lowering  the  power  consumption  of  the  total  receiver. 

Digital  baseband  timing  recovery  can  ease  the  design  of  the  analog  and  RF  circuitry  by 
correcting  for  non-idealities  caused  by  sub-optimal  implementations.  This  tradeoff  becomes 
especially  important  in  single-chip  radios  when  the  RF  and  analog  circuitry  needs  to  be 
implemented  in  an  ostensibly  digital  process  with  low  voltages— a  difficult  task.  By  transferring 
some  of  the  complexity  to  the  digital  domain,  it  is  conjectured  that  the  entire  system  can 
consume  less  power.  This  work  is  taking  place  within  the  PicoRadio  project  where  low  power  is 
the  primary  goal.  We  investigate  the  architectural  and  implementation  issues  related  to  building 
low  power  baseband  timing  recovery  systems  in  VLSI. 

In  this  research,  the  computational  hardware  requirements  for  timing  recovery  on  the  various 
PicoRadio  physical  layers  provide  a  platform  for  evaluation  of  the  digital  baseband  timing 
recovery  systems.  A  low  power  1.6  Mbps  baseband  timing  recovery  processor  (BBP)  has  been 
developed  for  the  PicoNode  Phase  2  (PN2)  system  using  a  custom  ASIC  design  flow.  Aggressive 
clock  gating  and  supply  voltage  scaling  is  used  to  reduce  power  consumption.  The  BBP  ASIC  is 
implemented  in  a  triple-well,  0.18pm  digital  CMOS  process  with  6  metal  layers.  The  cores  use  a 
1.0V  supply  voltage  and  1.8V  external  I/O.  The  600k  transistor  ASIC  has  a  core  and  pad-limited 
die  area  of  2.2mm2  and  14.5mm2,  respectively.  The  clock  frequency  is  25  MHz  and  the  measured 
worst-case  power  consumption  during  data  receive/transmit,  carrier  search,  and  code  acquisition 
modes  is  15  mW.  The  lessons  learned  from  the  PN2  system  are  incorporated  into  the  Pico  Node 
Phase  3  (PN3)  system.  The  baseband  timing  recovery  processor  for  the  PN3  system  is  near 
completion. 

The  ongoing  efforts  include  modification  of  algorithms,  and  the  efficient  mapping  of  these 
algorithms  into  architectures  and  VLSI  implementations  that  provide  the  final  measure  of 
complexity  and  power  consumption. 
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Figure  34 :  PN2  baseband  timing  recovery  processor  chip  plot 


2.C.3.5  Summary  of  PicoNode  III  chip  set  implementation 

Author:  Mike  Sheets 

This  research  is  focused  on  exploiting  system-level  characteristics  of  wireless  sensor  nodes,  to 
reduce  the  power  consumption  of  the  chip. 

The  type  of  communication  within  a  wireless  sensor  node  follows  a  reactive,  event-driven 
model.  At  a  most  basic  level,  the  layers  of  a  protocol  stack  can  be  thought  of  as  logic  that 
potentially  produces  data  in  response  to  some  input  stimulus.  This  stimulus  may  come  from  a 
number  of  sources,  including  detection  of  energy  on  the  wireless  channel,  expiration  of  a  system 
timer,  and  communication  from  another  component  on  the  chip.  Components  on  the  chip  can 
remain  inactive  until  a  stimulus  is  received.  The  model  for  execution  is  that  when  a  stimulus 
(event)  is  received,  the  block  wakes  up,  processes  the  event,  and  goes  inactive  again.  To 
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implement  this  model,  the  PN3  digital  chip  (charm)  is  divided  into  sub-blocks  with  well- 
specified  interfaces  to  other  components  in  the  system.  This  allows  the  interactions  between  sub¬ 
blocks  to  be  observed  and  managed  by  a  system  supervisor. 

One  of  the  driving  aspects  of  the  system  supervisor  is  to  control  the  power  consumed  by  the  chip. 
As  process  dimensions  decrease,  leakage  power  is  becoming  an  increasingly  significant  portion 
of  total  power.  This  is  particularly  true  for  low  active  duty  cycle  systems,  such  as  PicoRadio 
sensor  nodes.  Standby  leakage  power  is  addressed  by  extending  the  notion  of  clock  domains  to 
power  domains.  A  power  domain  is  a  region  of  logic  that  can  be  “turned  off’  independently  of 
the  other  regions.  In  the  charm  chip,  this  gating  is  performed  using  a  virtual  Vdd  supply  rail  that 
can  be  disconnected  from  the  chip  power  supply  using  control  logic.  When  in  the  inactive  mode, 
the  leakage  of  the  block  is  reduced  markedly. 

Since  blocks  can  be  active  or  inactive,  a  mechanism  is  required  to  ensure  that  the  destination  of 
on-chip  communication  is  actually  active.  This  mechanism  is  supported  through  control 
messages  exchanged  with  the  system  supervisor.  When  a  block  wishes  to  communicate  with 
another  component  on  the  chip,  it  requests  a  communication  session  from  the  system  supervisor. 
The  system  supervisor  then  ensures  that  both  the  source  and  destination  of  the  session  are  kept  in 
an  active  state  as  long  as  the  session  is  open.  Once  a  session  is  established,  the  source  and 
destination  can  then  communicate  peer-to-peer.  This  session  based  approach  works  well  for 
sensor  nodes,  because  most  communication  involves  passage  of  packets  between  protocol  layers. 
The  overhead  for  the  system  supervisor  is  then  amortized  over  the  signaling  required  for  the 
entire  packet. 

This  approach  allows  most  components  on  the  chip  to  be  in  the  inactive  mode  for  the  majority  of 
time,  but  in  practice  there  are  two  types  of  inactive  modes.  The  first  type  involves  logic  in  which 
state  need  not  be  preserved  between  activations.  An  example  of  this  type  is  base  band  logic  that 
must  resynchronize  for  every  packet.  The  virtual  Vdd  for  this  type  of  logic  can  be  allowed  to 
discharge  all  the  way  to  ground.  The  second  type,  however,  requires  that  state  be  preserved 
between  activations.  An  example  of  this  type  is  a  microprocessor  whose  code  and  data  memory 
must  remain  non-volatile.  For  this  type  of  logic,  the  virtual  Vdd  can  be  reduced  in  inactive  mode 
to  a  lower  “data  retention”  voltage.  At  this  voltage,  the  leakage  is  reduced,  but  the  state  is 
preserved  when  the  power  rails  are  restored  to  their  full  voltage  levels.  The  data  retention  voltage 
can  be  generated  using  an  on-chip  switch-capacitance  DC-DC  converter.  Since  the  logic 
connected  to  the  converter  never  switches  while  in  inactive  mode,  the  current  requirements  for 
the  DC-DC  converter  need  only  counteract  the  small  leakage  current.  This  allows  a  single, 
relatively  small,  converter  to  provide  the  data  retention  voltage  for  the  entire  chip. 

Since  the  blocks  cannot  have  any  switching  activity  while  in  inactive  mode,  logic  that  involves 
free-running  counters  (such  as  timers)  must  be  handled  external  to  the  block.  The  system 
supervisor  supports  this  by  providing  a  number  of  virtual  timers.  Blocks  can  schedule  themselves 
to  be  awoken  at  a  future  time  by  requesting  an  alarm  from  the  system  supervisor.  The  block  can 
then  enter  inactive  mode  and  the  supervisor  will  reactivate  it  when  the  timer  expires.  The  virtual 
timers  are  implemented  in  the  system  supervisor  using  a  single  system  time  wheel.  When  added 
to  the  alarm  table,  the  alarms  are  sorted  according  to  their  expiration  time.  A  low-power  digital 
comparator  is  used  to  minimize  switching  activity  while  continuously  comparing  the  next  alarm 
time  to  the  current  time. 


47 


Since  the  system  time  wheel  portion  of  the  system  supervisor  is  always  active  when  alarms  are 
set,  power  is  further  reduced  using  two  system  clocks.  The  external  crystal  will  have  a  relatively 
low  frequency  of  about  16  KHz.  The  system  time  wheel  is  clocked  using  this  portion  because  it 
provides  adequate  timing  resolution  for  the  alarms.  An  on-chip  digital  PLL  will  multiply  this 
clock  to  approximately  16  MHz  for  active  mode  communication.  When  no  blocks  on  the  chip  are 
in  active  mode,  the  system  supervisor  can  further  reduce  power  consumption  by  disabling  the 
PLL. 


2.  C.  3. 5.1  Microprocessor 

The  following  describes  the  microprocessor  and  method  of  reducing  execution  time. 

The  microprocessor  chosen  for  the  PN3  prototype  is  a  synthesizable  variant  of  the  Intel  8051 
microcontroller.  In  the  PN3  prototype,  the  network  and  application  layers  are  implemented  in 
software  running  on  the  8051.  The  basic  functions  required  in  these  layers  are  to  process  the 
packet  locally,  generate  a  packet  to  be  sent  to  a  monitoring  node,  or  forward  a  packet  to  the  next 
hop  in  the  network.  Implementation  of  these  functions  requires  few  data  path  operations,  thus  a 
microcontroller  is  used  since  it  is  designed  to  run  control-dominated  software. 

During  simulation  and  emulation  of  the  system,  the  software  code  is  profiled  to  identify  where 
most  of  the  execution  time  (and  by  extension,  power)  is  spent  by  the  microcontroller.  The  goal  of 
this  profiling  is  to  identify  the  most  costly  operations  and  optimize  these  using  hardware 
accelerators.  The  analysis  revealed  that  almost  one  third  of  the  time  to  forward  a  packet  (11708 
cycles  out  of  38112  cycles  on  average)  is  spent  copying  data  from  the  receive  queue  to  the 
transmit  queue.  This  is  because  of  the  slow  access  time  for  the  microcontroller  to  read  and  write 
into  the  queues.  Implementation  of  a  direct  memory  access  accelerator  reduces  this  time  to  a  few 
dozen  clock  cycles,  reducing  the  microprocessor  power  consumption  by  almost  1/3  during  a 
packet-forwarding  scenario.  A  similar  approach  is  applied  to  the  remaining  code  until  the  duty 
cycle  for  the  microprocessor  is  reduced  to  <  5%. 


2.C.3.5.2  Emulation  environment 

This  section  describes  the  hardware  emulation  environment  for  a  node  and  the  plans  to  emulate 
an  entire  system. 

Due  to  the  significant  difference  in  time  scales  between  the  high-level  protocol  and  the  on-chip 
circuitry,  emulation  is  used  to  verify  the  correct  operation  of  the  system.  For  this 
implementation,  the  VHDL  code  is  targeted  to  a  Xilinx  FPGA.  Software  debugging  is  supported 
through  a  serial  interface  to  debugging  software  running  on  a  PC.  Hardware  debugging  is 
supported  through  the  Xilinx  ChipScope  logic  analysis  core. 

Future  plans  involve  using  the  Berkeley  Emulation  Engine  (Chang,  C  et  al.  2002)  to  instantiate  a 
network  of  nodes  for  system  protocol  testing.  The  BEE  is  a  collection  of  16  high-end  Xilinx 
FPGAs  connected  together  to  form  a  single  system.  It  is  expected  that  it  can  support  16  or  more 
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complete  nodes  along  with  a  channel  model  that  simulates  various  network  topologies.  With  this 
approach,  the  correct  function  of  the  node  and  the  high-level  protocol  can  be  verified  before 
finalizing  the  silicon. 


2.C.3.6  PicoRadio  RF  transceiver 

Authors:  Brian  Otis,  Ulrich  Schuster,  and  Richard  Lu 

2.  C.3. 6. 1  PicoRadio  RF  transceiver  simulations 

As  a  part  of  the  PicoRadio  project  to  build  a  ubiquitous  ad-hoc  wireless  sensor  node  network,  the 
physical  layer  has  to  provide  a  reliable  point-to-point  radio  link  under  very  tight  power 
constraints.  The  analog  transceiver  building  blocks  make  up  a  large  percentage  of  the  overall 
power  budget.  In  order  to  minimize  the  amount  of  energy  needed  to  convey  one  bit  of 
information,  new  strategies  have  to  be  employed  which  take  into  account  the  power  consumption 
not  only  in  a  communication  theoretic  sense  in  the  form  of  energy  transmitted  over  the  channel, 
but  also  the  energy  needed  to  meet  performance  requirements  of  the  analog  and  digital  building 
blocks  in  the  receiver  chain.  Following  this  approach,  the  PicoRadio  RF  group  is  implementing  a 
transceiver  utilizing  the  least  number  of  analog  components  possible  together  with  promising 
new  technologies  like  RF-MEMS  (Otis  and  Rabaey  2002).  This  research  focuses  on  modelling 
the  radio  link,  including  these  blocks. 

Although  the  goal  is  a  very  simple  RF  and  baseband  analog  circuit,  the  analysis  of  the  end-to-end 
link  is  complicated,  precisely  because  the  system  is  no  longer  linear  and  the  nonidealities  are  not 
jut  a  mere  nuisance  but  a  fundamental  design  parameter,  trading  of  power  for  nonlinear  operation 
and  a  high  system  noise  floor. 

The  architecture  under  consideration  consists  of  a  directly  modulated  oscillator  and  a  power 
amplifier  as  the  transmitter,  a  tuned  RF  amplifier  and  an  envelope  detector  at  the  receiver  as 
shown  in  Figure  35.  To  assess  the  performance  in  terms  of  the  obtainable  error  probability,  a 
behavioral  simulation  model  includes  baseband  equivalent  models  for  all  the  blocks,  derived 
either  from  circuit  equations  or  via  a  curve-fitting  approach.  The  directly  modulated  oscillator 
limits  the  pulse-shaping  capabilities  to  simple  ON-OFF  keyed  modulation.  At  the  receiver  side, 
the  envelope  detector  has  a  quadratic  transfer  characteristic,  hence  the  only  a  1.5dB  increase  in 
transmit  power  is  necessary  to  obtain  a  3dB  increase  in  receive  power. 

Most  analog  building  block  scale  linearly  or  sub-linearly  with  the  data  rate  (except  for  the 
analog-to-digital  converter).  The  power  consumption  of  the  analog  subsystem  is  dominated  by 
biasing,  hence  an  increase  in  data  rate  means  that  the  whole  radio  can  be  put  into  sleep  mode 
longer.  The  simulation  results  shown  in  Figure  36  support  a  maximum  achievable  data  rate  with 
the  current  radio  of  160kbps  with  a  path  loss  of  64  dB  and  OdBm  transmit  power  (~  10m  Tx-Rx 
separation  in  an  indoor  environment).  Due  to  the  direct  conversion  architecture,  flicker  noise  is  a 
major  concern  and  more  than  35dB  of  RF  gain  are  needed  to  overcome  the  noise. 
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Figure  36:  PicoRadio  RF  transceiver  bit  error  rate 
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From  the  simulation  results  a  further  abstraction  is  possible  into  a  model  suitable  for  semi¬ 
manual  analysis.  This  model  can  be  used  to  average  over  different  fading  states  of  the  channel. 
Fading  has  a  major  impact  on  the  system  performance,  as  every  lost  packet  is  wasted  energy. 
Classical  diversity  schemes  used  to  combat  fading  are  not  applicable  due  to  the  short  packets  and 
the  narrowband  system.  Opportunistic  diversity  schemes  where  data  is  only  transmitted  when  a 
good  channel  exists  are  an  appropriate  solution  as  long  as  the  additional  latency  can  be  tolerated. 


2.C.3.6.2  PicoRadio  low  power,  low  noise  RF  amplifier 

In  order  to  achieve  the  low  power  goals  of  PicoRadio,  new  architectures  for  the  RF  receiver  must 
be  researched.  An  important  component  in  the  receiver  is  the  low  noise  amplifier.  For  this 
application,  the  low  noise  amplifier  must  provide  high  gain  and  adequate  noise  and  linearity 
while  consuming  minimal  power. 

In  this  research,  a  prototype  utilizing  an  inductively  degenerated  common  source  amplifier 
utilizing  a  RF  MEMS  FBAR  resonator  was  fabricated  in  a  0.13  m  CMOS  process.  The  FBAR 
resonator  is  capable  of  providing  a  high  Q  tank  and  narrowband  filtering.  Another  advantage  of 
the  resonator  is  that  it  can  ultimately  be  integrated  on-chip.  In  this  architecture,  the  resonator  will 
be  used  for  tuning  the  output  tank,  as  well  as  providing  high  impedance  at  resonance  in  order  to 
generate  gain.  On-chip  spiral  inductors  are  used  at  the  source  for  input  impedance  matching,  and 
in  parallel  with  the  FBAR  resonator  at  the  output  to  provide  DC  bias  current  through  the 
transistors.  The  gate  inductor  used  to  determine  the  resonant  frequency  is  implemented  off-chip. 
This  prototype  is  currently  being  characterized  in  the  BWRC  lab. 

In  the  next  generation  receiver,  the  amplification  was  divided  between  two  stages.  The  first  stage 
consists  of  an  LNA  with  an  LC  output  tank  to  provide  a  gain  across  a  broader  range  of 
frequencies  (lower  Q).  The  second  stage  is  a  RF  amplifier  that  uses  the  FBAR  filter  to  filter  the 
signal.  In  this  design,  the  LNA  uses  a  passive  input  matching  network  along  with  the  non-quasi 
static  gate  resistance  to  maximize  the  gain  as  well  as  to  provide  a  real  50  ohm  input  impedance  at 
the  resonant  frequency.  Simulations  have  shown  that  the  LNA  is  capable  of  providing  16  dB  of 
power  gain  (30  dB  of  voltage  gain)  with  a  noise  figure  of  2.6  dB  while  consuming  only  1.8  mW, 
as  shown  in  Ligures  37  and  38,  respectively. 


2.C.3.6.3  PicoRadio  transceiver  implementation 

Progress  was  made  towards  the  design  and  implementation  of  an  integrated,  low  power 
transceiver  for  the  PicoRadio  Project.  A  test  chip  was  designed,  fabricated,  and  tested.  New 
CMOS/MEMS  packaging  methodologies  were  explored.  In  December  an  entire  prototype 
transceiver  was  taped  out  in  a  0.13pm  CMOS. 

TEST  CHIP 

A  prototype  test  chip  was  fabricated,  bonded,  and  tested.  The  chip,  fabricated  using  a  0.13pm  ST 
Microelectronics  CMOS  process,  was  designed  to  test  and  characterize  various  circuit  blocks 
that  will  ultimately  constitute  a  complete  prototype  transceiver.  See  Ligure  39  for  a  photograph 
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of  the  bonded  chip.  The  CMOS  test  chip  is  bonded  to  two  Agilent  FBAR  resonator  chips.  The 
CMOS  and  FBAR  chips  are  mounted  with  conductive  epoxy  to  a  gold  substrate. 
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Figure  36:  AC  gain  of  LNA 
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Figure  37 :  Noise  figure  of  LNA 
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Figure  39 :  CMOS/FB AR  test  chip 


Testing  of  this  system  is  still  in  progress;  however,  the  following  results  have  been  measured: 

•  300, uW  1.9GFIz  RF  oscillator  (revision  2)  -  bonded  to  FBAR  -  Functional  and 
operates  as  expected. 

•  200  nW  envelope  detector  -  Functional,  transfer  function  measurements  reveal  sub¬ 
threshold  slope  parameter  is  larger  than  simulation  results  (measured  n~1.65).  The  n- 
value  measurement  was  verified  on  a  dedicated  test  device.  The  envelope  detector 
transfer  function  is  shown  in  Figure  40.  The  X-axis  shows  the  amplitude  of  the  2GFIz 
RF  input.  The  Y-axis  shows  the  demodulated  output  voltage  level.  Six  measurements 
were  taken  across  various  die.  The  measured  results  agree  match  well  with  the 
theoretical  behavioral  transfer  function,  described  in  Section  2. C. 3. 6.1. 

•  Oscillator/PA  subsystem  -  Bonded  to  FBAR  resonator.  Constitutes  entire  test 
transmitter  -  Functional. 

•  lmW  2GFIz  LNA  -  Tuned  with  an  FBAR  resonator  -  Untested 
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Figure  40 :  Measured  envelope  detector  transfer  function 


TRANSCEIVER  PACKAGING  METHODOLOGY 

As  discussed  previously,  the  transceiver  prototype  test  chip  uses  separate  CMOS  and  MEMS 
chips.  As  such,  the  packaging  and  high  frequency  interconnects  between  these  chips  is  crucial 
from  a  performance  standpoint.  Chip-on-board  (COB)  packaging  is  one  option  that  would  allow 
direct  chip-to-chip  interconnect  as  well  as  chip-to-board  interconnect.  To  test  the  applicability  of 
this  technology  to  low  power  CMOS/MEMS  components,  a  COB  board  was  designed  to  test  a 
1.9GHz  oscillator.  Figure  41  shows  the  completed  board. 

The  board,  which  has  been  tested  and  is  fully  functional,  allows  connections  from  the  MEMS 
chip  to  the  0.18pm  CMOS  chip,  as  well  as  supply  and  output  connections  from  the  CMOS  chip 
to  the  board.  See  Figure  42  for  a  detailed  photograph  of  these  connections. 

The  CMOS  and  MEMS  chips  were  placed  in  close  proximity  to  allow  minimization  of  the  chip- 
chip  bond  wire  interconnects.  This  is  important,  as  large  series  inductance  in  this  interconnect 
could  lead  to  parasitic  oscillations.  This  COB  prototype  shows  that  effective  and  robust 
CMOS/MEMS  subsystems  can  be  constructed  and  efficiently  packaged.  It  also  implies  that,  for 
small  form-factor  packaging,  the  MEMS  and  CMOS  chips  can  be  bonded  together  within  one 
package. 
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Figure  40:  COB  oscillator  test  board 
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Figure  42:  Wiring  details  of  COB  test  board 


TRANSCEIVER  PROTOTYPE 

A  complete  two  channel,  low  power,  integrated  transceiver  has  been  designed.  See  Figure  43  for 
a  block  diagram  of  the  receiver. 

The  receiver  contains  no  mixers,  and  relies  on  the  high  Q  filtering  of  MEMS  resonators  for  band 
selection.  The  power  consumption  of  the  prototype  receiver  is  approximately  3mW.  The 
transmitter  is  shown  in  Figure  44. 
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Figure  42:  Prototype  receiver  block  diagram 


Figure  44 :  Prototype  transmitter  block  diagram 


The  prototype  two-channel  transmitter  consists  of  two  RF  oscillators.  These  oscillators  use 
Agilent  FBAR  MEMS  resonators  and  were  published  at  ESSCIRC  2002. 


CIRCUIT  INNOVATIONS 

As  the  field  of  RF  MEMS  continues  to  develop,  the  need  for  circuit/MEMS  co-design  becomes 
increasingly  important.  A  fully  differential  oscillator  using  FBAR  resonators  was  designed,  and 
will  be  taped  out  in  early  December.  See  Figure  45  for  the  layout  of  this  oscillator. 

There  are  numerous  advantages  to  using  a  differential  oscillator  topology,  including  better  power 
supply  rejection  and  increased  signal  swings.  Larger  signals  swings  allow  for  better  phase  noise 
performance  and  a  more  efficient  power  amplifier. 
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Figure  45 :  Differential  RF  oscillator 


2.C.3.7  Energy  scavenging 

Author:  Shad  Roundy 

Table  3  shows  a  broad  comparison  of  potential  power  sources.  The  table  is  divided  into  two 
sections  separating  sources  that  provide  a  fixed  level  of  power  from  those  that  are  fundamentally 
energy  reservoirs.  Based  on  the  broad  survey,  it  was  decided  to  pursue  both  solar  power  and 
vibration  conversion.  Figure  46  shows  a  graph  of  average  power  output  per  cubic  centimeter 
versus  lifetime  for  solar,  vibrations,  and  several  battery  chemistries.  The  power  output  of  both 
solar  and  vibration  based  power  depend  on  the  particular  light  or  vibration  source.  Thus,  the 
boxes  shown  in  the  figure  are  meant  to  give  the  practical  range.  Figure  46  shows  that  for  devices 
with  a  lifetime  of  only  a  few  years,  primary  batteries  will  provide  comparable  average  power 
density  as  solar  and  vibration  sources.  Flowever,  for  longer  lifetimes  both  solar  power  and 
vibrations  can  be  attractive  for  certain  applications.  Solar  power  technology  is  mature  and  can  be 
implemented  with  off-the-shelf  items.  Therefore,  the  more  detailed  research  and  development 
work  has  been  done  on  vibrations  converters. 

A  wide  variety  of  vibration  sources  have  been  considered  and  measured  including  FIVAC  ducts, 
large  industrial  equipment,  small  household  appliances,  large  exterior  windows,  office  building 
floors,  and  automobiles.  A  representative  vibration  input  based  on  all  the  sources  measured  is 
2.25  m/s“  (0.23  g’s)  focused  at  120  FIz.  Therefore,  power  output  values  presented  are  for  this 
particular  vibration  source  which  is  representative  of  many  of  those  measured,  and  falls  about  in 
the  middle  in  terms  of  potential  for  power  conversion. 
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Comparison  of  Energy  Scavenging  Sources 


Power  Density  (?W/cm3) 

1  Year  lifetime 

Power  Density  (?W/cm3) 
10  Year  lifetime 

Source 
of  information 

Solar  (Outdoors) 

15,000  -  direct  sun 

150  -  cloudv  dav 

15,000  -  direct  sun 

150  -  cloudv  dav 

Commonlv  Available 

Solar  (Indoors) 

6  -  office  desk 

6  -  office  desk 

Experiment 

Vibrations 

100  -  200 

100-  200 

Experiment  and  Theory 

Acoustic  Noise 

0.003  @  75  Db 

0.96  @  100  Db 

0.003  @  75  Db 

0.96  @  100  Db 

Theory 

Daily  Temp.  Variation 

10 

10 

Theory 

Temperature  Gradient 

15  @  10  °C  gradient 

15  @  10  °C  gradient 

Stordeur  and  Stark  1997 

Shoe  Inserts 

330 

330 

Starner  1996 

Shenck  &  Paradiso  2001 

Batteries 

(non-recharg.  Lithium) 

89 

7 

Commonly  Available 

Batteries 

(recharqeable  Lithium) 

13.7 

0 

Commonlv  Available 

Gasoline 

(micro  heat  engine) 

403 

40.3 

Mehra  et.  al.  2000 

Fuel  Cells  (methanol) 

560 

56 

Commonly  Available 

Table  3:  Comparison  of  power  sources  for  wireless  sensor  nodes 


Continuous  Power  /  cm?  vs.  Life  Several  Energy  Sources 


Figure  46 :  Graph  of  average  power  versus  lifetime  for  solar, 
vibrations,  and  several  battery  chemistries 


Three  potential  methods  of  coupling  the  mechanical  kinetic  energy  to  electrical  energy  exist. 
They  are  inductive,  capacitive  (or  electrostatic),  and  piezoelectric.  Given  the  constraints  of  the 
project  (size,  voltage,  etc.),  it  appears  that  capacitive  and  piezoelectric  converters  are  the  most 
attractive. 
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Piezoelectric  converters  based  on  bending  elements  have  been  modelled,  simulated,  built  and 
tested.  Figure  47  shows  a  two  different  converters  built  using  PZT  (lead  zirconate  titanate) 
bending  elements.  Experiments  have  validated  the  analytical  model,  which  has  subsequently 
been  used  as  a  basis  for  design  optimization.  The  maximum  measured  power  output  from 
optimized  designs  is  300  W/cm3  from  a  vibration  source  of  2.25  m/s"  at  120  Hz.  Power  output 
values  of  100  to  200  W/cm3  can  be  expected  using  more  realistic  power  electronics. 


Figure  47 :  Two  different  piezoelectric  generators  using  a  PZT  bender 
with  tungsten  alloy  proof  mass 


Electrostatic  converters  based  on  MEMS  have  been  designed  and  are  currently  being  fabricated 
and  tested.  Simulations  show  that  a  maximum  of  about  1 10  W/cm  can  be  generated  from  the 
same  vibration  input  as  used  previously.  While  this  is  far  lower  than  the  potential  for 
piezoelectric  converters,  electrostatic  MEMS  converters  may  still  be  attractive  for  certain 
applications  because  of  their  greater  potential  for  integration  with  microelectronics.  Figure  48 
shows  SEM  images  of  a  preliminary  electrostatic  converter  prototype. 


Figure  48:  SEM  images  of  a  preliminary  electrostatic  converter  prototype 
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